Abstract:Accurate identification of multi-category targets in tomato images is the technical premise for automatic picking. Aiming at the problems of low segmentation accuracy and the large number of model parameters in existing networks, a multi-category segmentation method based on improved DeepLabv3+ was proposed for tomato images. The method used GhostNet and coordinate attention (CA) to construct CA-GhostNet as the backbone feature extraction network of DeepLabv3+, reducing the number of parameters in the network. And a multi-branch decoding structure was designed to improve the segmentation accuracy of the model for small target categories. Then the weight parameters of the synthesized dataset were used for migration training based on the single and binocular small sample dataset. Eight semantic categories such as fruit, trunk, branch and thin line were segmented. The results showed that mean intersection over union (MIoU) and mean pixel accuracy (MPA) of improved DeepLabv3+ model were 68.64% and 78.59% on the monocular dataset, respectively. The MIoU and MPA were 73.00% and 80.59% on the binocular dataset. In addition, the memory occupation of the proposed model was only 18.5MB, and the inference time of a single image was 55ms. Compared with the baseline model, the MIoU on the monocular and binocular datasets was increased by 6.40 percentage points and 6.98 percentage points, respectively. Compared with HRNet, UNet and PSPNet, the memory occupation was reduced by 82%, 79% and 88%, respectively. The research result can provide reference for intelligent picking and safe operation of tomato picking robot.