In this paper, we show how Behavior Trees that have performance guarantees, in terms of safety and goal convergence, can be extended with components that were designed using machine learning, without destroying those performance guarantees. Machine learning approaches such as reinforcement learning or learning from demonstration can be very appealing to AI designers that want efficient and realistic behaviors in their agents. However, those algorithms seldom provide guarantees for solving the given task in all different situations while keeping the agent safe. Instead, such guarantees are often easier to find for manually designed model-based approaches. In this paper we exploit the modularity of behavior trees to extend a given design with an efficient, but possibly unreliable, machine learning component in a way that preserves the guarantees. The approach is illustrated with an inverted pendulum example.
翻译:在本文中,我们展示了在安全和目标趋同方面有绩效保障的“行为树”可如何在不破坏这些绩效保障的情况下,通过使用机器学习设计的组成部分来扩展在安全和目标趋同方面有绩效保障的“行为树”的延伸,而不会破坏这些绩效保障。强化学习或从演示中学习等机械学习方法对AI设计师非常有吸引力,他们希望在其代理商中采取高效和现实的行为。然而,这些算法很少能提供在所有不同情况下解决特定任务的保障,同时又能保证代理商的安全。相反,这些担保往往更容易用于人工设计的基于模型的方法。在本文中,我们利用行为树的模块性来扩展特定的设计,以有效但可能不可靠的机器学习要素来保护这些保障。这种方法用一个倒置的圆形示例来说明。