Abstract:Plant phenotyping is an important research topic in the field of botany. The similarity of plant phenotypes is widely used in plant taxonomy, ecology and digital agriculture etc. It is one of the important contents of plant phenotype research. Chrysanthemum is an important plant in China as well as in the world, and the phenotype similarity evaluation of chrysanthemum plays an important role in chrysanthemum classification and phenotypic research. The feature of high-dimension of massive chrysanthemum data brings great challenge for chrysanthemum phenotype analysis, from this point of view, the chrysanthemum phenotypic similarity query and evaluation were studied based on multiprobe locality sensitive hashing technique. For evaluating the similarity of chrysanthemum image, the SIFT features of the chrysanthemum images were extracted and clustered based on the K-means method. Hereafter, the bag of visual words (BoVW) model was built. Due to the high-dimensional nature of the image features, especially for the massive chrysanthemum images, the computing efficiency of the query was a big challenge for the high dimensional problem. The multi-probe locality sensitive hashing (LSH) was applied for chrysanthemum phenotype similarity computing. The multiprobe locality sensitive hashing technique was an optimization technique for high-dimensional data similarity query. By means of the technique, a hash data structure of chrysanthemum image data was constructed, which improved query efficiency in chrysanthemum similarity query and ensured the query result quality. The theory of the multi-probe locality sensitive hashing was analyzed, in addition to this, extensive experiments were conducted and important results were gained as well. Experiments showed that compared with linear scanning, the average success probability of the query can reach above 090, and the average acceleration ratio was 3.3~19.8,furthermore, it was also compared with the typical method in the aspects of query quality and query efficiency, and the results demonstrated that the method was better than the entropy based LSH in quality and performance. The experimental results revealed that the query quality and query efficiency could be tuned flexibly through the parameter settings of hash function number and the hash tables, which provided an elastic way for the choice for tuning the quality and efficiency. In addition, it can provide technical reference for massive chrysanthemum phenotypic similarity calculation.