This work extends an existing virtual multi-agent platform called RoboSumo to create TripleSumo -- a platform for investigating multi-agent cooperative behaviors in continuous action spaces, with physical contact in an adversarial environment. In this paper we investigate a scenario in which two agents, namely `Bug' and `Ant', must team up and push another agent `Spider' out of the arena. To tackle this goal, the newly added agent `Bug' is trained during an ongoing match between `Ant' and `Spider'. `Bug' must develop awareness of the other agents' actions, infer the strategy of both sides, and eventually learn an action policy to cooperate. The reinforcement learning algorithm Deep Deterministic Policy Gradient (DDPG) is implemented with a hybrid reward structure combining dense and sparse rewards. The cooperative behavior is quantitatively evaluated by the mean probability of winning the match and mean number of steps needed to win.
翻译:这项工作扩展了一个名为 RoboSumo 的现有虚拟多试剂平台, 以创建 TripleSumo -- -- 一个在连续行动空间调查多剂合作行为的平台, 在对抗环境下进行物理接触。 在本文中, 我们调查了两种代理, 即` Bug' 和` Ant', 必须联合起来, 将另一个代理“ Spider” 推出竞技场。 为了实现这一目标, 新添加的代理“ Bug” 在“ Ant” 和“ Spider” 之间的持续匹配中接受培训。 “ Bug” 必须提高对其他代理行动的认识, 推导出双方的战略, 并最终学习合作的行动政策。 强化算法“ 深确定性政策梯度(DPG) ”, 与混合的奖励结构结合密度和微弱的奖励实施。 合作行为在数量上评价, 以赢得匹配的平均概率和获胜所需的平均步骤数量。