Peer prediction refers to a collection of mechanisms for eliciting information from human agents when direct verification of the obtained information is unavailable. They are designed to have a game-theoretic equilibrium where everyone reveals their private information truthfully. This result holds under the assumption that agents are Bayesian and they each adopt a fixed strategy across all tasks. Human agents however are observed in many domains to exhibit learning behavior in sequential settings. In this paper, we explore the dynamics of sequential peer prediction mechanisms when participants are learning agents. We first show that the notion of no regret alone for the agents' learning algorithms cannot guarantee convergence to the truthful strategy. We then focus on a family of learning algorithms where strategy updates only depend on agents' cumulative rewards and prove that agents' strategies in the popular Correlated Agreement (CA) mechanism converge to truthful reporting when they use algorithms from this family. This family of algorithms is not necessarily no-regret, but includes several familiar no-regret learning algorithms (e.g multiplicative weight update and Follow the Perturbed Leader) as special cases. Simulation of several algorithms in this family as well as the $\epsilon$-greedy algorithm, which is outside of this family, shows convergence to the truthful strategy in the CA mechanism.
翻译:同侪预测是指在无法直接核实所获得的信息时,从人类代理人那里获取信息的收集机制;它们的设计是,在每个人真实地披露其私人信息的游戏理论平衡的基础上,这种结果的假设是,代理人是巴伊西亚人,他们各自采取一项固定的战略,但是在许多领域都观察到人类代理人,以显示相继环境中的学习行为;在本文件中,当参与者是学习代理人时,我们探讨相继同侪预测机制的动态;我们首先表明,对于代理人的学习算法,毫不后悔的概念不能保证与真实的战略趋同;然后我们侧重于学习算法的大家庭,其中战略更新仅取决于代理人的累积回报,并证明流行的《科尔相关协定》(CA)机制中的代理人战略在他们使用这个家庭的算法时会与真实的汇报一致。这种算法的组合不一定是无差别的,但包括一些熟悉的不固定的学习算法(例如,倍增重量更新和跟随Perturbed Lead)作为特殊案例。我们然后侧重于这一家庭的若干算法的缩算法的缩略性,作为家庭外算法的一种正态的缩略法。