Abstract:In the field environment, picking robots are faced with the problems of a large number of picking tasks, randomness and uncertainty in the positions of targets and obstacles, and so on. Traditional picking path planning methods usually use kinematics equations combined with the shortest path algorithm to solve them, while takes a lot of time to calculate in each planning. In order to improve the efficiency of trajectory planning to adapt to the field picking environment, a virtual robot picking path planning method based on deep reinforcement learning was proposed. Firstly, the virtual robot random action strategies were set according to the real robot physical structure, and the environment observation set was rationally set as the input of the network by analyzing the actual picking behavior. Establishing reward function with the reference to the idea of target attraction and obstacle rejection in artificial potential field method, which was used to evaluate the behavior of virtual robots and improve the success rate of obstacle avoidance. Aiming at the problem that the range repulsion of the artificial potential field method affected the shortest path planning, a directional penalty obstacle avoidance function setting method was proposed, which converted the obstacle range penalty into a single direction penalty. Besides, by establishing a virtual robot motion collision model, the direction penalties were giving selectively by analysis results of the model. Finally, a simulation environment in Unity was built, and the distributed proximal policy optimization algorithm was used to train the virtual robot. The simulation experiment results showed that the success rate of the virtual robot in completing the picking task was over 96.7% under the condition of obstacles in different positions. In 200 random picking experiments, the directional penalty obstacle avoidance function method had a picking success rate of 97.5%, which was 11 percentage points higher than the ordinary reward function method, and the picking trajectory planning took an average of 0.64s/time, which was 0.45s/time shorter than the artificial potential field method. The research results showed that the system can efficiently guide virtual robots to quickly reach random picking points under the premise of avoiding obstacles, and met the requirements of picking tasks, which provided theoretical and technical support for real robot picking path planning.