Data that is gathered adaptively --- via bandit algorithms, for example --- exhibits bias. This is true both when gathering simple numeric valued data --- the empirical means kept track of by stochastic bandit algorithms are biased downwards --- and when gathering more complicated data --- running hypothesis tests on complex data gathered via contextual bandit algorithms leads to false discovery. In this paper, we show that this problem is mitigated if the data collection procedure is differentially private. This lets us both bound the bias of simple numeric valued quantities (like the empirical means of stochastic bandit algorithms), and correct the p-values of hypothesis tests run on the adaptively gathered data. Moreover, there exist differentially private bandit algorithms with near optimal regret bounds: we apply existing theorems in the simple stochastic case, and give a new analysis for linear contextual bandits. We complement our theoretical results with experiments validating our theory.
翻译:通过强盗算法收集的适应性数据 -- -- 例如通过强盗算法收集的数据 -- -- 物证偏差。在收集简单的有价值数字数据时 -- -- 实证手段跟踪强盗算法的跟踪向下偏向 -- -- 和收集更复杂的数据时 -- -- 对通过背景强盗算法收集的复杂数据进行假设测试,导致虚假发现。在本文中,我们显示,如果数据收集程序有差异的私密性,这一问题就会缓解。这让我们既能将简单有价值数字数量的偏差(如随机强盗算法的经验手段)加以约束,又能纠正在适应性收集的数据上进行的假设测试的p值。此外,还存在差别化的私人强盗算法,近乎最佳的遗憾界限:我们在简单的强盗案中应用现有的标语,并对线性背景强盗进行新的分析。我们用验证理论的实验来补充我们的理论结果。