Abstract:Pepper trees yield is a substantial quantity of fruits, characterized by crisscrossed branches and dense foliage, resulting insignificant challenges for automated peppercorn picking. Therefore, a fast identification and localization method of pepper clusters in complex environment based on improved YOLO v5 was proposed. By adding efficient channel attention (ECA) after the CSPLayer of the backbone extraction network CSPDarknet and the upsampling layer of Neck to simplify the computation of the CSPLayer layer and improve the feature extraction capability. In the downsampling layer, coordinate attention (CA) was added to reduce the loss of information in the downsampling process, strengthen the spatial information of features, and cooperate with the heat map (Grad-CAM) and the depth map of the point cloud to complete the spatial localization of pepper clusters. The test results showed that the improved network over the original YOLO v5 reduced the residual computation to 1 time, which ensured the model was lightweight and the efficiency was improved. Under the same frame number interval, the accuracy of the improved network was 96.27%, comparing with three similar feature extraction networks YOLO v5, YOLO v5-tiny, and Faster R-CNN, the precision of the improved network was improved by 5.37 percentage points, 3.35 percentage points, and 15.37 percentage points, respectively, and the ability of separating and recognizing the pepper clusters of the successive plants was greatly improved. The experimental results showed that the average checking accuracy of the system in the natural environment was 81.60%, and the leakage rate was 18.39%, which can satisfy the pepper cluster recognition, and build the foundation for mobile deployment.