Autonomous agents that act with each other on behalf of humans are becoming more common in many social domains, such as customer service, transportation, and health care. In such social situations greedy strategies can reduce the positive outcome for all agents, such as leading to stop-and-go traffic on highways, or causing a denial of service on a communications channel. Instead, we desire autonomous decision-making for efficient performance while also considering equitability of the group to avoid these pitfalls. Unfortunately, in complex situations it is far easier to design machine learning objectives for selfish strategies than for equitable behaviors. Here we present a simple way to reward groups of agents in both evolution and reinforcement learning domains by the performance of their weakest member. We show how this yields ``fairer'' more equitable behavior, while also maximizing individual outcomes, and we show the relationship to biological selection mechanisms of group-level selection and inclusive fitness theory.
翻译:在许多社会领域,例如客户服务、交通和保健领域,代表人类相互行动的自主代理人越来越常见。在这种社会情况下,贪婪的战略可以减少所有代理人的积极结果,例如导致高速公路上断断断续续的交通,或导致通信渠道拒绝提供服务。相反,我们希望自主决策以提高效率,同时也考虑群体公平性以避免这些缺陷。不幸的是,在复杂的情况下,为自私的战略设计机器学习目标比为公平行为设计目标要容易得多。在这里,我们提出了一个简单的方法,通过最弱的成员的表现来奖励进化和强化学习领域的代理人群体。我们展示这如何产生“公平者”更公平的行为,同时也最大限度地实现个人成果,我们展示了与生物选择机制的关系,即群体选择和包容性健身理论。