This study presents a benchmark for evaluating action-constrained reinforcement learning (RL) algorithms. In action-constrained RL, each action taken by the learning system must comply with certain constraints. These constraints are crucial for ensuring the feasibility and safety of actions in real-world systems. We evaluate existing algorithms and their novel variants across multiple robotics control environments, encompassing multiple action constraint types. Our evaluation provides the first in-depth perspective of the field, revealing surprising insights, including the effectiveness of a straightforward baseline approach. The benchmark problems and associated code utilized in our experiments are made available online at github.com/omron-sinicx/action-constrained-RL-benchmark for further research and development.
翻译:本文提出了一种评估带动作约束的强化学习算法的基准测试。在带动作约束的强化学习中,学习系统采取的每个动作都必须符合某些约束。这些约束对于确保真实世界系统中的动作的可行性和安全性至关重要。我们评估现有算法及其新颖的变体在多个机器人控制环境中跨多个动作约束类型进行评估。我们的评估提供了该领域的首个深入视角,揭示了令人惊讶的见解,包括简单基线方法的有效性。我们在网络上提供了我们实验中使用的基准问题和相关代码,供进一步研究和开发。