Reinforcement learning control of an underground loader is investigated in simulated environment, using a multi-agent deep neural network approach. At the start of each loading cycle, one agent selects the dig position from a depth camera image of the pile of fragmented rock. A second agent is responsible for continuous control of the vehicle, with the goal of filling the bucket at the selected loading point, while avoiding collisions, getting stuck, or losing ground traction. It relies on motion and force sensors, as well as on camera and lidar. Using a soft actor-critic algorithm the agents learn policies for efficient bucket filling over many subsequent loading cycles, with clear ability to adapt to the changing environment. The best results, on average 75% of the max capacity, are obtained when including a penalty for energy usage in the reward.
翻译:在模拟环境中,利用多剂深神经网络方法,对地下装载器的强化学习控制进行模拟调查。在每次装载周期开始时,一个代理商从碎石堆的深层摄像头图像中选择挖掘位置。第二个代理商负责连续控制车辆,目的是在选定的装载点装满桶,同时避免碰撞、卡住或失去地面牵引力。它依靠运动和力感应器以及相机和利达尔。利用软式的行为者-批评算法,代理商学习关于在今后许多装载周期中高效装桶的政策,并明确有能力适应不断变化的环境。如果在奖励中包括能源使用处罚,则取得最佳效果,平均达到最大容量的75%。