矫正逆强化学习中的特征表示诊断与增强 (Diagnosing and Augmenting Feature Representations in Correctional Inverse Reinforcement Learning)

Robots have been increasingly better at doing tasks for humans by learning from their feedback, but still often suffer from model misalignment due to missing or incorrectly learned features. When the features the robot needs to learn to perform its task are missing or do not generalize well to new settings, the robot will not be able to learn the task the human wants and, even worse, may learn a completely different and undesired behavior. Prior work shows how the robot can detect when its representation is missing some feature and can, thus, ask the human to be taught about the new feature; however, these works do not differentiate between features that are completely missing and those that exist but do not generalize to new environments. In the latter case, the robot would detect misalignment and simply learn a new feature, leading to an arbitrarily growing feature representation that can, in turn, lead to spurious correlations and incorrect learning down the line. In this work, we propose separating the two sources of misalignment: we propose a framework for determining whether a feature the robot needs is incorrectly learned and does not generalize to new environment setups vs. is entirely missing from the robot's representation. Once we detect the source of error, we show how the human can initiate the realignment process for the model: if the feature is missing, we follow prior work for learning new features; however, if the feature exists but does not generalize, we use data augmentation to expand its training and, thus, complete the correction. We demonstrate the proposed approach in experiments with a simulated 7DoF robot manipulator and physical human corrections.

翻译：机器人通过从人类的反馈中学习来更好地执行任务，但由于缺少或错误地学习了特征，它们仍然经常出现模型不对齐的问题。当机器人需要学习的特征缺失或在新环境中无法很好地推广时，机器人将无法学习人类想要的任务，并且甚至可能学习完全不同和不需要的行为。先前的工作显示了机器人如何检测其表示是否缺少某个特征，并且因此可以要求人类教授新功能；然而，这些方法没有区分完全缺失和存在但无法推广到新环境的特征。在后一种情况下，机器人会检测到错误并简单地学习新特征，从而导致任意增长的特征表示，反过来可能导致错误的学习。在本论文中，我们提出分离两种不对齐的源：我们提出了一个框架，用于确定机器人需要的特征是错误地学习并且无法推广到新环境设置还是完全缺少于机器人的表示。一旦我们检测到错误的源，我们展示了人类如何启动模型的重新对齐过程：如果特征缺失，我们遵循以前的工作学习新特征；然而，如果特征存在但不推广，我们使用数据增强来扩展其训练，从而完成纠正。我们在使用模拟的七自由度机器人操作器和物理人类纠正的实验中演示了所提出的方法。