Abstract:Under the picking conditions in unstructured environments, such as overlapping and occlusion, the recognition system based on deep learning in apple picking robot contained complex network structure and large parameter volumes, for which the response speed of detection model was severely limited. In response to this problem, based on the embedded platform, a lightweight apple real-time detection method called YOLO v4-CA, which selected YOLO v4 as the basic framework, was proposed. The proposed method used MobileNet v3 as the feature extraction network, and introduced deep separable convolution in the feature fusion network to reduce network computational complexity. In order to ensure the detection accuracy, coordinate attention was introduced in the key position of the network to strengthen target attention, which can improve the ability to detect dense targets and resist background interference. For the small apple datasets, a combination of cross-domain and in-domain transfer learning strategy was proposed to improve the generalization ability of the model. Experimental results showed that the average precision of the improved model was 92.23%, and the detection speed on the embedded hardware platform was 15.11 frames per second, which was about three times than that of the original YOLO v4 model. Compared with the two representative target detection algorithms of SSD300 and Faster R-CNN, the average precision was increased by 0.91 percentage points and 2.02 percentage points respectively, and the detection speed on the embedded hardware platform was about 1.75 times and 12 times that of the two respectively. Compared with the two lightweight target detection algorithms of DY3TNet and YOLO v5s, the average precision was increased by 7.33 percentage points and 7.73 percentage points respectively. Therefore, the improved model YOLO v4-CA can efficiently detect apples in a complex orchard environment in real time, and it was suitable for deployment on embedded systems. It can provide solutions for the recognition system of apple picking robots.