基于深度強(qiáng)化學(xué)習(xí)的虛擬機(jī)器人采摘路徑避障規(guī)劃

doi:10.6041/j.issn.1000-1298.2020.S2.001

首頁 > 過刊瀏覽>2020年第51卷第s2期 >1-10. DOI:10.6041/j.issn.1000-1298.2020.S2.001

基于深度強(qiáng)化學(xué)習(xí)的虛擬機(jī)器人采摘路徑避障規(guī)劃
DOI:
                        10.6041/j.issn.1000-1298.2020.S2.001
                    
作者:
                        
                        
                    
作者單位:
作者簡介:
通訊作者:
中圖分類號(hào):
基金項(xiàng)目:國家自然科學(xué)基金項(xiàng)目（32071912）、廣東省自然科學(xué)基金項(xiàng)目(2018A030313330)、廣州市科技計(jì)劃項(xiàng)目（202002030423）和國家級(jí)大學(xué)生創(chuàng)新創(chuàng)業(yè)訓(xùn)練計(jì)劃項(xiàng)目（201910564033）

Obstacle Avoidance Planning of Virtual Robot Picking Path Based on Deep Reinforcement Learning

Author:

Affiliation:

Fund Project:

摘要

圖/表

訪問統(tǒng)計(jì)

參考文獻(xiàn)

相似文獻(xiàn)

引證文獻(xiàn)

資源附件

文章評(píng)論

摘要:

針對(duì)采摘機(jī)器人在野外作業(yè)環(huán)境中，面臨采摘任務(wù)數(shù)量多，目標(biāo)與障礙物位置具有隨機(jī)性和不確定性等問題，提出一種基于深度強(qiáng)化學(xué)習(xí)的虛擬機(jī)器人采摘路徑避障規(guī)劃方法，實(shí)現(xiàn)機(jī)器人在大量且不確定任務(wù)情況下的快速軌跡規(guī)劃。根據(jù)機(jī)器人本體物理結(jié)構(gòu)設(shè)定虛擬機(jī)器人隨機(jī)運(yùn)動(dòng)策略，通過對(duì)比分析不同網(wǎng)絡(luò)輸入觀測值的優(yōu)劣，結(jié)合實(shí)際采摘行為設(shè)置環(huán)境觀測集合，作為網(wǎng)絡(luò)的輸入；引入人工勢場法目標(biāo)吸引和障礙排斥的思想建立獎(jiǎng)懲函數(shù)，對(duì)虛擬機(jī)器人行為進(jìn)行評(píng)價(jià)，提高避障成功率；針對(duì)人工勢場法范圍斥力影響最短路徑規(guī)劃的問題，提出了一種方向懲罰避障函數(shù)設(shè)置方法，將障礙物范圍懲罰轉(zhuǎn)換為單一方向懲罰，通過建立虛擬機(jī)器人運(yùn)動(dòng)碰撞模型，分析碰撞結(jié)果選擇性給予方向懲罰，進(jìn)一步優(yōu)化了規(guī)劃路徑長度，提高采摘效率；在Unity內(nèi)搭建仿真環(huán)境，使用ML-Agents組件建立分布式近端策略優(yōu)化算法及其與仿真環(huán)境的交互通信，對(duì)虛擬機(jī)器人進(jìn)行采摘訓(xùn)練。仿真實(shí)驗(yàn)結(jié)果顯示，不同位置障礙物設(shè)置情況下虛擬機(jī)器人完成采摘任務(wù)成功率達(dá)96.7%以上。在200次隨機(jī)采摘實(shí)驗(yàn)中，方向懲罰避障函數(shù)方法采摘成功率為97.5%，比普通獎(jiǎng)勵(lì)函數(shù)方法提高了11個(gè)百分點(diǎn)，采摘軌跡規(guī)劃平均耗時(shí)0.64s/次，相較于基于人工勢場法獎(jiǎng)勵(lì)函數(shù)方法降低了0.45s/次，且在連續(xù)變動(dòng)任務(wù)實(shí)驗(yàn)中具有更高的適應(yīng)性和魯棒性。研究結(jié)果表明，本系統(tǒng)能夠高效引導(dǎo)虛擬機(jī)器人在避開障礙物的前提下快速到達(dá)隨機(jī)采摘點(diǎn)，滿足采摘任務(wù)要求，為真實(shí)機(jī)器人采摘路徑規(guī)劃提供理論與技術(shù)支撐。

Abstract:

In the field environment, picking robots are faced with the problems of a large number of picking tasks, randomness and uncertainty in the positions of targets and obstacles, and so on. Traditional picking path planning methods usually use kinematics equations combined with the shortest path algorithm to solve them, while takes a lot of time to calculate in each planning. In order to improve the efficiency of trajectory planning to adapt to the field picking environment, a virtual robot picking path planning method based on deep reinforcement learning was proposed. Firstly, the virtual robot random action strategies were set according to the real robot physical structure, and the environment observation set was rationally set as the input of the network by analyzing the actual picking behavior. Establishing reward function with the reference to the idea of target attraction and obstacle rejection in artificial potential field method, which was used to evaluate the behavior of virtual robots and improve the success rate of obstacle avoidance. Aiming at the problem that the range repulsion of the artificial potential field method affected the shortest path planning, a directional penalty obstacle avoidance function setting method was proposed, which converted the obstacle range penalty into a single direction penalty. Besides, by establishing a virtual robot motion collision model, the direction penalties were giving selectively by analysis results of the model. Finally, a simulation environment in Unity was built, and the distributed proximal policy optimization algorithm was used to train the virtual robot. The simulation experiment results showed that the success rate of the virtual robot in completing the picking task was over 96.7% under the condition of obstacles in different positions. In 200 random picking experiments, the directional penalty obstacle avoidance function method had a picking success rate of 97.5%, which was 11 percentage points higher than the ordinary reward function method, and the picking trajectory planning took an average of 0.64s/time, which was 0.45s/time shorter than the artificial potential field method. The research results showed that the system can efficiently guide virtual robots to quickly reach random picking points under the premise of avoiding obstacles, and met the requirements of picking tasks, which provided theoretical and technical support for real robot picking path planning.

參考文獻(xiàn)

相似文獻(xiàn)

引證文獻(xiàn)

引用本文

熊俊濤,李中行,陳淑綿,鄭鎮(zhèn)輝.基于深度強(qiáng)化學(xué)習(xí)的虛擬機(jī)器人采摘路徑避障規(guī)劃[J].農(nóng)業(yè)機(jī)械學(xué)報(bào),2020,51(s2):1-10. XIONG Juntao, LI Zhonghang, CHEN Shumian, ZHENG Zhenhui. Obstacle Avoidance Planning of Virtual Robot Picking Path Based on Deep Reinforcement Learning[J]. Transactions of the Chinese Society for Agricultural Machinery,2020,51(s2):1-10.

復(fù)制

文章指標(biāo)

點(diǎn)擊次數(shù):
下載次數(shù):
HTML閱讀次數(shù):
引用次數(shù):

歷史

收稿日期:2020-08-05
最后修改日期:
錄用日期:
在線發(fā)布日期: 2020-12-10
出版日期: 2020-12-10

亚洲一区欧美在线,日韩欧美视频免费观看,色戒的三场床戏分别是在几段,欧美日韩国产在线人成

期刊瀏覽

EI收錄結(jié)果

引用本文

分享

文章指標(biāo)

歷史

文章二維碼