In this paper, we study endogeneity problems in algorithmic decision-making where data and actions are interdependent. When there are endogenous covariates in a contextual multi-armed bandit model, a novel bias (self-fulfilling bias) arises because the endogeneity of the covariates spills over to the actions. We propose a class of algorithms to correct for the bias by incorporating instrumental variables into leading online learning algorithms. These algorithms also attain regret levels that match the best known lower bound for the cases without endogeneity. To establish the theoretical properties, we develop a general technique that untangles the interdependence between data and actions.
翻译:在本文中,我们研究了数据与行动相互依存的算法决策中的内分性问题。当一个背景多武装土匪模型中存在内在共变时,出现了一种新的偏差(自我实现偏差),因为共差的内在性会将溢出溢出物与行动联系起来。我们建议了一组算法来纠正这种偏差,将工具变量纳入主要的在线学习算法中。这些算法还取得了与最知名的无内分泌的较低范围的案件相匹配的遗憾水平。为了建立理论属性,我们开发了一种总的技术,将数据与行动之间的相互依存分解开来。