In this work, we establish heuristics and learning strategies for the autonomous control of a dozer grading an uneven area studded with sand piles. We formalize the problem as a Markov Decision Process, design a simulation which demonstrates agent-environment interactions and finally compare our simulator to a real dozer prototype. We use methods from reinforcement learning, behavior cloning and contrastive learning to train a hybrid policy. Our trained agent, AGPNet, reaches human-level performance and outperforms current state-of-the-art machine learning methods for the autonomous grading task. In addition, our agent is capable of generalizing from random scenarios to unseen real world problems.
翻译:在这项工作中,我们为自动控制一个带有沙堆的不均匀区域定级的 dozer 制定了超常和学习战略。我们把问题正式化为Markov 决策程序,设计一个模拟过程,展示代理-环境相互作用,最后将我们的模拟器与真正的 dozer 原型进行比较。我们用强化学习、行为克隆和对比学习等方法来训练混合政策。我们受过训练的代理商AGPNet 达到人的水平性能并超越了当前在自主定级任务方面最先进的机器学习方法。此外,我们的代理商能够从随机的情景到未知的真实世界问题进行概括化。