Abstract:In order to improve the accuracy of winter wheat yield estimation and the phenomena of underestimation of high yield and overestimation of low yield that exist in yield estimation models, the Guanzhong Plain in Shaanxi Province, China was taken as the study area, and the vegetation temperature condition index (VTCI), leaf area index (LAI) and fraction of photosynthetically active radiation (FPAR) at the ten-day interval were selected as remotely sensed parameters, and a deep learning model was proposed for estimating winter wheat yield by combining the local feature extraction capability of convolutional neural network (CNN) and the global information extraction capability of Transformer network based on the mechanism of self-attention. Compared with the Transformer model (R2 was 0.64, RMSE was 465.40kg/hm2, MAPE was 8.04%), the CNN-Transformer model had higher accuracy in estimating winter wheat yield (R2 was 0.70, RMSE was 420.39kg/hm2, MAPE was 7.65%), which can extract more yield-related information from the multiple remotely sensed parameters, and improved the underestimation of high yield and overestimation of low yield which existed in the Transformer model. The robustness and generalization ability of the CNN-Transformer model were further validated based on the five-fold cross-validation method and the leave-one-out method. In addition, based on the CNN-Transformer model, the cumulative effect of the winter wheat growth process was revealed, the impact of gradually accumulating ten-day scale input information on yield estimation was analyzed, and the ability of the model to characterize the accumulation process of winter wheat at different growth stages was assessed. The results showed that the model can effectively capture the critical period of winter wheat growth, which was from late March to early May.