Artificial learning agents are mediating a larger and larger number of interactions among humans, firms, and organizations, and the intersection between mechanism design and machine learning has been heavily investigated in recent years. However, mechanism design methods often make strong assumptions on how participants behave (e.g. rationality), on the kind of knowledge designers have access to a priori (e.g. access to strong baseline mechanisms), or on what the goal of the mechanism should be (e.g. total welfare). Here we introduce HCMD-zero, a general purpose method to construct mechanisms making none of these three assumptions. HCMD-zero learns to mediate interactions among participants and adjusts the mechanism parameters to make itself more likely to be preferred by participants. It does so by remaining engaged in an electoral contest with copies of itself, thereby accessing direct feedback from participants. We test our method on a stylized resource allocation game that highlights the tension between productivity, equality and the temptation to free ride. HCMD-zero produces a mechanism that is preferred by human participants over a strong baseline, it does so automatically, without requiring prior knowledge, and using human behavioral trajectories sparingly and effectively. Our analysis shows HCMD-zero consistently makes the mechanism policy more and more likely to be preferred by human participants over the course of training, and that it results in a mechanism with an interpretable and intuitive policy.
翻译:人工学习代理机构在人类、公司和组织之间进行越来越多的互动,近年来对机制设计与机器学习之间的交叉点进行了大量调查,但机制设计方法往往对参与者的行为方式(例如理性)、知识设计者如何获得先验性(例如使用强有力的基线机制)或机制的目标(例如,总体福利)作出强有力的假设,或对机制的目标(例如,总体福利)进行严格的假设。在这里,我们引入了HCCMD-0,这是构建机制的通用目的方法,使这三种假设中没有任何一种。 HCCM-零学会在参与者之间进行调解,调整机制参数,使自己更有可能被参与者所偏爱。它这样做的办法是继续参加与自己副本的选举竞赛,从而获得参与者的直接反馈。我们测试了我们的方法,该方法突出生产力、平等和自由骑车的诱惑力之间的紧张关系。 HCMD-零生成了一种人类参与者偏好于强基线的机制,它这样做是自动的,不需要事先知识,而是调整机制参数,使自己更有可能被参与者所偏爱。