We study a dynamic model of Bayesian persuasion in sequential decision-making settings. An informed principal observes an external parameter of the world and advises an uninformed agent about actions to take over time. The agent takes actions in each time step based on the current state, the principal's advice/signal, and beliefs about the external parameter. The action of the agent updates the state according to a stochastic process. The model arises naturally in many applications, e.g., an app (the principal) can advice the user (the agent) on possible choices between actions based on additional real-time information the app has. We study the problem of designing a signaling strategy from the principal's point of view. We show that the principal has an optimal strategy against a myopic agent, who only optimizes their rewards locally, and the optimal strategy can be computed in polynomial time. In contrast, it is NP-hard to approximate an optimal policy against a far-sighted agent. Further, we show that if the principal has the power to threaten the agent by not providing future signals, then we can efficiently design a threat-based strategy. This strategy guarantees the principal's payoff as if playing against an agent who is far-sighted but myopic to future signals.
翻译:我们研究的是按顺序决策设置的巴耶斯说服动态模型。 一位知情的校长观察了一个世界的外部参数, 并向一个不知情的代理商建议一段时间内要采取的行动。 该代理商根据当前状态、 校长的建议/ 信号和对外部参数的信念, 每隔一步采取行动。 该代理商的行动根据一个随机过程更新了国家。 该模型自然在许多应用中产生。 例如, 应用程序( 代理商) 可以建议用户( 代理商) 根据应用程序所掌握的额外实时信息, 就可能的行动之间的选择作出选择。 我们从首席代理商的角度研究设计信号战略的问题。 我们表明, 该代理商对一个仅在当地优化其奖赏的近视代理商拥有最佳战略, 而最佳战略可以在多元时间里进行计算。 相比之下, 该代理商很难对远视最佳政策对远视线的代理商。 此外, 我们证明, 如果该代理商有能力通过不提供未来信号来威胁该代理商, 那么我们就可以有效地设计一个基于威胁的战略。 我们表明, 该代理商对一个远方保证未来支付主要信号。