Abstract:At present, the detection of soybean seed coat crack mainly depends on visual inspection, which has low detection efficiency and large error, a method for automatic identification of soybean seed coat cracks based on near infrared spectroscopy and machine learning was proposed. The near infrared spectra of 150 soybean samples (75 cracked and 75 normal) were collected by FT-NIR spectrometer. The original spectra, standard normal variable (SNV), multiple scatter correction (MSC), the first derivative and the second derivative with SG smoothing were used to process the obtained spectra. Then partial least squares discriminant analysis (PLS-DA), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), stochastic gradient boosting (SGB) and extreme gradient boosting (XGBoost) were used to establish soybean seed coat crack identification models. The effects of different spectral preprocessing methods on the classification results of the six machine learning methods were compared and analyzed. Under the appropriate spectral preprocessing conditions, the accuracy of validation set of six different machine learning algorithms was not less than 80.00%. PLS-DA had the best classification result, and the optimal accuracy rate of validation set reached 90.00%; the next was XGBoost, the optimal accuracy rate of validation set reached 86.67%, followed by SVM, KNN, SGB and RF. The results showed that near infrared spectroscopy combined with machine learning was feasible to identify soybean seed coat cracks, and PLS-DA was the best method to identify soybean seed coat cracks under the original spectral conditions. The research result can provide a method for automatic identification of soybean seed coat cracks.