Bayesian persuasion studies how an informed sender should partially disclose information to influence the behavior of a self-interested receiver. Classical models make the stringent assumption that the sender knows the receiver's utility. This can be relaxed by considering an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type. We study, for the first time, an online Bayesian persuasion setting with multiple receivers. We focus on the case with no externalities and binary actions, as customary in offline models. Our goal is to design no-regret algorithms for the sender with polynomial per-iteration running time. First, we prove a negative result: for any $0 < \alpha \leq 1$, there is no polynomial-time no-$\alpha$-regret algorithm when the sender's utility function is supermodular or anonymous. Then, we focus on the case of submodular sender's utility functions and we show that, in this case, it is possible to design a polynomial-time no-$(1 - \frac{1}{e})$-regret algorithm. To do so, we introduce a general online gradient descent scheme to handle online learning problems with a finite number of possible loss functions. This requires the existence of an approximate projection oracle. We show that, in our setting, there exists one such projection oracle which can be implemented in polynomial time.
翻译:Bayesian 说服力研究 知情发送者如何部分披露信息以影响自我感兴趣的接收者的行为。 经典模型做出严格假设发送者知道接收者的作用。 考虑一个在线学习框架, 发送者反复面对一个未知的、 对抗性选择型的接收者, 可以放松这一点。 我们第一次研究由多个接收者组成的在线Bayesian 说服设置。 我们像离线模式一样, 关注无外差和二进制动作的案例。 我们的目标是为使用多等离线时间运行的发送者设计不重复的算法。 首先, 我们证明一个负面结果: 对于任何 $ < alpha\leq 1$ 的发送者, 当发送者使用超调频或匿名功能时, 我们第一次研究这个案件。 我们关注亚等发送者的实用功能, 我们在此情况下, 我们有可能设计一个多等时点不值的计算算法 。 任何对于任何 $ AL- alphia\ leq $ $ 的计算方法, 都没有一个在网上的排序中, 显示一个可能的排序 。 。