适应性实验政策评价的适应性杜布利强力模拟器和关于伐木政策的单词性模拟器 (The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive Experiments and a Paradox Concerning Logging Policy)

The doubly robust (DR) estimator, which consists of two nuisance parameters, the conditional mean outcome and the logging policy (the probability of choosing an action), is crucial in causal inference. This paper proposes a DR estimator for dependent samples obtained from adaptive experiments. To obtain an asymptotically normal semiparametric estimator from dependent samples with non-Donsker nuisance estimators, we propose adaptive-fitting as a variant of sample-splitting. We also report an empirical paradox that our proposed DR estimator tends to show better performances compared to other estimators utilizing the true logging policy. While a similar phenomenon is known for estimators with i.i.d. samples, traditional explanations based on asymptotic efficiency cannot elucidate our case with dependent samples. We confirm this hypothesis through simulation studies.

翻译：由两个骚扰参数、有条件平均结果和伐木政策(选择行动的可能性)组成的双倍强(DR)估测仪在因果推断中至关重要。本文件建议对从适应性实验中得来的依附样本使用DR估计仪。要从非唐斯克扰动估计仪的依附样本中获取一个无症状正常的半参数估测仪,我们建议将适应性适合作为样本分割的变体。我们还报告了一个经验悖论,即我们提议的DR估计仪与其他利用真正伐木政策的估测员相比,往往表现出更好的表现。虽然对使用i.i.d.样本的估测员来说,一种类似的现象是众所周知的,但基于非唐斯克扰效率的传统解释无法用依赖性样本来解释我们的案件。我们通过模拟研究来证实这一假设。