We propose and design recommendation systems that incentivize efficient exploration. Agents arrive sequentially, choose actions and receive rewards, drawn from fixed but unknown action-specific distributions. The recommendation system presents each agent with actions and rewards from a subsequence of past agents, chosen ex ante. Thus, the agents engage in sequential social learning, moderated by these subsequences. We asymptotically attain optimal regret rate for exploration, using a flexible frequentist behavioral model and mitigating rationality and commitment assumptions inherent in prior work. We suggest three components of effective recommendation systems: independent focus groups, group aggregators, and interlaced information structures.
翻译:我们提出并设计激励有效勘探的建议系统。代理人员按顺序抵达,选择行动和接受奖励,取自固定但未知的特定行动分布。建议系统向每个代理提供由先前选定的过去代理人员从子序列中产生的行动和奖励。因此,代理人员参与由这些子序列调控的顺序社会学习。我们利用灵活的常态行为模式和减轻先前工作中固有的合理性和承诺性假设,不时获得最佳的探索遗憾率。我们建议了有效建议系统的三个组成部分:独立的焦点小组、群体聚合器和相互交错的信息结构。</s>