Randomized experiments can be susceptible to selection bias due to potential non-compliance by the participants. While much of the existing work has studied compliance as a static behavior, we propose a game-theoretic model to study compliance as dynamic behavior that may change over time. In rounds, a social planner interacts with a sequence of heterogeneous agents who arrive with their unobserved private type that determines both their prior preferences across the actions (e.g., control and treatment) and their baseline rewards without taking any treatment. The planner provides each agent with a randomized recommendation that may alter their beliefs and their action selection. We develop a novel recommendation mechanism that views the planner's recommendation as a form of instrumental variable (IV) that only affects an agents' action selection, but not the observed rewards. We construct such IVs by carefully mapping the history -- the interactions between the planner and the previous agents -- to a random recommendation. Even though the initial agents may be completely non-compliant, our mechanism can incentivize compliance over time, thereby enabling the estimation of the treatment effect of each treatment, and minimizing the cumulative regret of the planner whose goal is to identify the optimal treatment.
翻译:由于参与者可能不遵守规定,随机实验可能具有选择偏见。虽然许多现有工作已经将守规视为静态行为,但我们提议了一个游戏理论模型,将守规视为可能随时间变化的动态行为。在回合中,社会规划者会与一组异类代理人进行互动,这些代理人到达时没有观察到的私人类型,这些类型决定了他们先前对各种行动(例如控制和治疗)和基线奖励的偏好,而没有采取任何治疗。规划者向每个代理人提供随机建议,可能改变他们的信仰和行动选择。我们开发了一个新的建议机制,将规划者的建议视为一种工具变量(四)的形式,仅影响代理人的行动选择,而不会影响观察到的回报。我们通过仔细绘制历史图解析,即规划者和前代理人之间的互动关系,来随机提出建议。即使最初的代理人可能完全不遵守规定,但我们的机制可以鼓励遵守,从而能够估计每次治疗的治疗效果,并最大限度地减少计划者为确定最佳待遇而累积的遗憾。