基于深度卷積神經(jīng)網(wǎng)絡(luò)的水稻知識文本分類方法

doi:10.6041/j.issn.1000-1298.2021.03.029

首頁 > 過刊瀏覽>2021年第52卷第3期 >257-264. DOI:10.6041/j.issn.1000-1298.2021.03.029

基于深度卷積神經(jīng)網(wǎng)絡(luò)的水稻知識文本分類方法
DOI:
                        10.6041/j.issn.1000-1298.2021.03.029
                    
作者:
                        
                        
                    
作者單位:
作者簡介:
通訊作者:
中圖分類號:
基金項目:國家重點(diǎn)研發(fā)計劃項目(2018YFD0300309)

Rice Knowledge Text Classification Based on Deep Convolution Neural Network

Author:

Affiliation:

Fund Project:

摘要

圖/表

訪問統(tǒng)計

參考文獻(xiàn)

相似文獻(xiàn)

引證文獻(xiàn)

資源附件

文章評論

摘要:

為解決文本特征提取不準(zhǔn)確和因網(wǎng)絡(luò)層次加深而導(dǎo)致模型分類性能變差等問題，提出基于深度卷積神經(jīng)網(wǎng)絡(luò)的水稻知識文本分類方法。針對水稻知識文本的特點(diǎn)，采用Word2Vec方法進(jìn)行文本向量化處理，并與One-Hot、TF-IDF和Hashing方法進(jìn)行對比分析，得出Word2Vec方法具有較高的分類精度，正確率為86.44%，能夠有效解決文本向量表示稀疏和信息不完整等問題。通過調(diào)整殘差網(wǎng)絡(luò)（Residual network，ResNet）結(jié)構(gòu)，分析殘差模塊結(jié)構(gòu)和網(wǎng)絡(luò)層次對分類網(wǎng)絡(luò)的影響，構(gòu)建了9種分類網(wǎng)絡(luò)結(jié)構(gòu)，測試結(jié)果表明，具有4層殘差模塊結(jié)構(gòu)的網(wǎng)絡(luò)具有較好的特征提取精度，Top-1準(zhǔn)確率為99.79%。采用優(yōu)選出的4層殘差模塊結(jié)構(gòu)作為基本結(jié)構(gòu)，使用膠囊網(wǎng)絡(luò)（Capsule network，CapsNet）替代其池化層，設(shè)計了水稻知識文本分類模型。與FastText、BiLSTM、Atten-BiGRU、RCNN、DPCNN和TextCNN等6種文本分類模型的對比分析表明，本文設(shè)計的文本分類模型能夠較好地對不同樣本量和不同復(fù)雜程度的水稻知識文本進(jìn)行精準(zhǔn)分類，模型的精準(zhǔn)率、召回率和F1值分別不小于95.17%、95.83%和95.50%，正確率為98.62%。本文模型能夠?qū)崿F(xiàn)準(zhǔn)確、高效的水稻知識文本分類，滿足實際應(yīng)用需求。

Abstract:

The data of weeds, pests, diseases and cultivation management of rice extracted from agricultural text data is a typical text classification problem, which is fundamental to key text information extraction, text data mining and agricultural intelligent question and answer. The classification of Chinese texts, especially agricultural texts, is characterized by poor data redundancy, sparsity and normativity. While the deep learning technology can automatically extract the key features of the text, and the built model has strong adaptability and mobility. For that reason, in order to solve the problem of classification performance of the model deteriorates caused by inaccurate text feature extraction and deepened network hierarchy, a text classification method of rice knowledge oriented Q&A system was proposed. The Python of scrapy was adopted to obtain Chinese text data on rice pests, grass pests, cultivation and management, such as the experts online system of Hownet and the planting question and answer website, as training and test samples. Jieba segmentation method was applied to rice knowledge text for word segmentation to remove useless symbols and stop words in the text. Meanwhile, the results of Chinese segmentation were greatly influenced by the segmentation lexicon. In order to improve the precision of word segmentation of rice knowledge text and reduce the situation of misclassification, omission and misclassification, a ricerelated corpus was constructed on the basis of sogou agricultural corpus, which further expanded the basic Jieba word segmentation database and improved the identification degree of specialized words such as rice diseases, insect pests, grass and drugs, cultivation and management. At the same time, Word2Vec method was used to vectorize text data, and it was compared with One-Hot, TF-IDF and Hashing methods, and it was concluded that Word2Vec method can effectively solve the text vector typical problems such as sparsity and incomplete information. Based on the fundamental structure of ResNet, nine kinds of rice knowledge text classification models were constructed by means of the change and design of its residual module and network hierarchy. The test results indicated that a network with 4-layer residual module structure had good feature extraction accuracy, and the Top-1 accuracy was 99.79%. In the convolutional neural network, the pooling layer was used for the under-sampling operation, which would lose certain text phrase relative position characteristics in the pooling process, thus affecting the classification accuracy of the model, therefore, the optimized 4-layer residual module structure was taken as the basic structure, and the CapsNet was used to replace the pooling layer, and a rice knowledge text classification model, referred to as RIC-Net, was designed. Through comparative analysis of six text classification models, including FastText, BiLSTM, Atten-BiGRU, RCNN, DPCNN and TextCNN, it was concluded that the text classification model designed was able to precisely classify rice knowledge texts with different sample sizes and different levels of complexity, which enabled the accuracy rate, recall rate and F1 value of the model to be no less than 95.17%, 95.83% and 95.50%, respectively, and the accuracy rate was as high as 98.62%. The model can realize accurate and efficient classification of rice knowledge text, meeting practical application requirements.

參考文獻(xiàn)

相似文獻(xiàn)

引證文獻(xiàn)

引用本文

馮帥,許童羽,周云成,趙冬雪,金寧,王郝日欽.基于深度卷積神經(jīng)網(wǎng)絡(luò)的水稻知識文本分類方法[J].農(nóng)業(yè)機(jī)械學(xué)報,2021,52(3):257-264. FENG Shuai, XU Tongyu, ZHOU Yuncheng, ZHAO Dongxue, JIN Ning, WANG Haoriqin. Rice Knowledge Text Classification Based on Deep Convolution Neural Network[J]. Transactions of the Chinese Society for Agricultural Machinery,2021,52(3):257-264.

復(fù)制

文章指標(biāo)

點(diǎn)擊次數(shù):
下載次數(shù):
HTML閱讀次數(shù):
引用次數(shù):

歷史

收稿日期:2020-06-13
最后修改日期:
錄用日期:
在線發(fā)布日期: 2021-03-10
出版日期:

亚洲一区欧美在线,日韩欧美视频免费观看,色戒的三场床戏分别是在几段,欧美日韩国产在线人成

期刊瀏覽

EI收錄結(jié)果

引用本文

分享

文章指標(biāo)

歷史