Agents capable of accomplishing complex tasks through multiple interactions with the environment have emerged as a popular research direction. However, in such multi-step settings, the conventional group-level policy optimization algorithm becomes suboptimal because of its underlying assumption that each action holds equal contribution, which deviates significantly from reality. Our analysis reveals that only a small fraction of actions are critical in determining the final outcome. Building on this insight, we propose CARL, a critical-action-focused reinforcement learning algorithm tailored for multi-step agents. CARL achieves focused training through providing action-level optimization signals for high-criticality actions while excluding low-criticality actions from model update. Extensive experiments demonstrate that CARL achieves both stronger performance and higher efficiency during training and inference across diverse evaluation settings.
翻译:能够通过与环境的多次交互完成复杂任务的智能体已成为一个热门研究方向。然而,在这种多步交互场景中,传统的群体级策略优化算法因其假设每个动作对最终结果贡献均等而变得次优,这与实际情况存在显著偏差。我们的分析表明,仅有少数关键动作对最终结果起决定性作用。基于这一发现,我们提出了CARL,一种专为多步智能体设计的关键动作聚焦强化学习算法。CARL通过为高关键性动作提供动作级优化信号,同时将低关键性动作排除在模型更新之外,实现了聚焦式训练。大量实验表明,在不同评估场景下,CARL在训练和推理过程中均实现了更强的性能和更高的效率。