We introduce and study the online Bayesian recommendation problem for a platform, who can observe a utility-relevant state of a product, repeatedly interacting with a population of myopic users through an online recommendation mechanism. This paradigm is common in a wide range of scenarios in the current Internet economy. For each user with her own private preference and belief, the platform commits to a recommendation strategy to utilize his information advantage on the product state to persuade the self-interested user to follow the recommendation. The platform does not know user's preferences and beliefs, and has to use an adaptive recommendation strategy to persuade with gradually learning user's preferences and beliefs in the process. We aim to design online learning policies with no Stackelberg regret for the platform, i.e., against the optimum policy in hindsight under the assumption that users will correspondingly adapt their behaviors to the benchmark policy. Our first result is an online policy that achieves double logarithm regret dependence on the number of rounds. We then present a hardness result showing that no adaptive online policy can achieve regret with better dependency on the number of rounds. Finally, by formulating the platform's problem as optimizing a linear program with membership oracle access, we present our second online policy that achieves regret with polynomial dependence on the number of states but logarithm dependence on the number of rounds.
翻译:我们引入并研究Bayesian在线建议平台问题,该平台可以观察产品的实用性状况,通过在线建议机制与近视用户群体反复互动,通过在线建议机制与一流用户反复互动。这一范例在目前互联网经济的多种情景中很常见。对于每个拥有个人偏好和信仰的用户,该平台承诺采用建议战略,利用其在产品国的信息优势,说服自己感兴趣的用户遵循该建议。该平台并不了解用户的偏好和信仰,而必须使用适应性建议战略,以逐渐学习用户的偏好和信仰的方式进行说服。我们的目标是设计在线学习政策,而该平台不会因为Stackelberg对后视的最佳政策而感到后悔。对于假设用户将相应调整其行为以适应基准政策为条件的每个用户而言,该平台承诺采用一个建议战略,即利用其在产品国的信息优势说服自身感兴趣的用户遵守该建议。该在线政策,从而实现双对数的遗憾。然后我们提出一个硬性结果,表明没有适应性在线政策能够更依赖轮数获得遗憾。最后,我们通过制定平台的对平台对平台的在线政策依赖性依赖度问题,从而实现对在线访问次数的智能依赖,从而实现对在线访问。