Many real-life scenarios require humans to make difficult trade-offs: do we always follow all the traffic rules or do we violate the speed limit in an emergency? These scenarios force us to evaluate the trade-off between collective rules and norms with our own personal objectives and desires. To create effective AI-human teams, we must equip AI agents with a model of how humans make these trade-offs in complex environments when there are implicit and explicit rules and constraints. Agent equipped with these models will be able to mirror human behavior and/or to draw human attention to situations where decision making could be improved. To this end, we propose a novel inverse reinforcement learning (IRL) method: Max Entropy Inverse Soft Constraint IRL (MESC-IRL), for learning implicit hard and soft constraints over states, actions, and state features from demonstrations in deterministic and non-deterministic environments modeled as Markov Decision Processes (MDPs). Our method enables agents implicitly learn human constraints and desires without the need for explicit modeling by the agent designer and to transfer these constraints between environments. Our novel method generalizes prior work which only considered deterministic hard constraints and achieves state of the art performance.
翻译:许多现实生活情景要求人类作出困难的权衡:我们是否总是遵循所有交通规则,或者在紧急情况下违反速度限制?这些假设迫使我们评估集体规则和规范与我们个人目标和愿望之间的权衡。为了建立有效的AI人类团队,我们必须为AI代理人员提供一个模型,说明人类如何在隐含和明确的规则和限制情况下在复杂环境中作出这些权衡。配备这些模型的代理人员将能够反映人类行为和/或提请人类注意可以改进决策的情况。为此,我们提议了一个新的反向强化学习方法:Max Entropy Inversver Softstratin IRL(MESC-IRL),用于学习对州、行动以及以Markov决定程序为模型的非决定性环境的演示中出现的隐性硬性和软性约束。我们的方法使代理人员可以隐含地学习人类的制约和愿望,而无需代理人明确进行模拟,并在环境之间转移这些制约。我们的新方法概括了之前的工作,只考虑过确定硬性约束和状态。