This work proposed an efficient learning-based framework to learn feedback control policies from human teleoperated demonstrations, which achieved obstacle negotiation, staircase traversal, slipping control and parcel delivery for a tracked robot. Due to uncertainties in real-world scenarios, eg obstacle and slippage, closed-loop feedback control plays an important role in improving robustness and resilience, but the control laws are difficult to program manually for achieving autonomous behaviours. We formulated an architecture based on a long-short-term-memory (LSTM) neural network, which effectively learn reactive control policies from human demonstrations. Using datasets from a few real demonstrations, our algorithm can directly learn successful policies, including obstacle-negotiation, stair-climbing and delivery, fall recovery and corrective control of slippage. We proposed decomposition of complex robot actions to reduce the difficulty of learning the long-term dependencies. Furthermore, we proposed a method to efficiently handle non-optimal demos and to learn new skills, since collecting enough demonstration can be time-consuming and sometimes very difficult on a real robotic system.
翻译:这项工作提议了一个有效的学习框架,以学习人类远程操作演示的反馈控制政策,这些演示取得了障碍谈判、楼梯穿行、滑倒控制以及为跟踪机器人提供包裹。由于现实情景中的不确定性,例如障碍和滑坡,闭路反馈控制在提高稳健性和复原力方面起着重要作用,但控制法难以手工编程,以实现自主行为。我们根据长期短期神经网络制定了一个架构,有效地学习人类演示的被动控制政策。我们算法可以直接学习成功的政策,包括障碍谈判、攀爬和交付、跌倒和纠正对滑坡的控制。我们提议拆解复杂的机器人行动,以减少学习长期依赖的困难。此外,我们提出了一种高效处理非最佳性低能的演示和学习新技术的方法,因为收集足够的演示可能耗费时间,有时对真正的机器人系统非常困难。