This paper presents a framework that leverages both control theory and machine learning to obtain stable and robust bipedal locomotion without the need for manual parameter tuning. Traditionally, gaits are generated through trajectory optimization methods and then realized experimentally -- a process that often requires extensive tuning due to differences between the models and hardware. In this work, the process of gait realization via hybrid zero dynamics (HZD) based optimization is formally combined with preference-based learning to systematically realize dynamically stable walking. Importantly, this learning approach does not require a carefully constructed reward function, but instead utilizes human pairwise preferences. The power of the proposed approach is demonstrated through two experiments on a planar biped AMBER-3M: the first with rigid point-feet, and the second with induced model uncertainty through the addition of springs where the added compliance was not accounted for in the gait generation or in the controller. In both experiments, the framework achieves stable, robust, efficient, and natural walking in fewer than 50 iterations with no reliance on a simulation environment. These results demonstrate a promising step in the unification of control theory and learning.
翻译:本文提出了一个框架,利用控制理论和机器学习来获得稳定而稳健的双足运动器,而不需要人工参数调整。传统上,曲子是通过轨迹优化方法生成的,然后通过实验实现的。由于模型和硬件之间的差异,这一过程往往需要广泛的调整。在这项工作中,通过混合零动力优化实现步态的过程正式与基于优惠的学习结合起来,以便系统地实现动态稳定的步行。重要的是,这种学习方法不需要精心构建的奖励功能,而是利用人类对称偏好。拟议方法的力量通过在平板双向的AMBER-3M上的两项实验得到证明:第一个试验是硬点脚,第二个试验是通过添加弹簧来引起模型不确定性,因为弹簧中没有考虑到增加的合规性。在这两个实验中,框架在不依赖模拟环境的情况下,在不到50次的高度内实现稳定、稳健健、高效和自然行走。这些结果表明,在统一控制理论和学习方面迈出了有希望的一步。