The purpose of this paper is to connect the "policy choice" problem, proposed in Kasy and Sautmann (2021), to the frontiers of the bandit literature in machine learning. We discuss how the policy choice problem can be framed in a way such that it is identical to what is called the "best arm identification" (BAI) problem. By connecting the literature, we identify that the asymptotic optimality of policy choice algorithms tackled in Kasy and Sautmann (2021) is a long-standing open question in the literature. Unfortunately, this connection highlights several major issues with the main theorem. In particular, we show that Theorem 1 in Kasy and Sautmann (2021) is false. We find that the proofs of statements (1) and (2) of Theorem 1 are incorrect, though the statements themselves may be true, though non-trivial to fix. Statement (3), and its proof, on the other hand, is false, which we show by utilizing existing theoretical results in the bandit literature. As this question is critically important, garnering much interest in the last decade within the bandit community, we provide a review of recent developments in the BAI literature. We hope this serves to highlight the relevance to economic problems and stimulate methodological and theoretical developments in the econometric community.
翻译:本文的目的是将Kasy和Sautmann(2021年)提出的“政策选择”问题与机器学习中的土匪文学界的边界联系起来。我们讨论了如何将政策选择问题与机器学习中的土匪文学界的边界联系起来。我们讨论了如何将政策选择问题与所谓的“最佳武器识别”问题(BAI)联系起来。我们通过将文献联系起来,发现在Kasy和Sautmann(2021年)处理的政策选择算法的无症状最佳性是文献中长期存在的开放问题。不幸的是,这一联系突出了主要理论界的若干重大问题。特别是,我们表明,Kasy和Sautmann(2021年)的Theorem 1号理论是错误的。我们发现,1号理论的证明(1)和(2)不正确,尽管这些说法本身可能是真的,尽管不是要修补。 声明(3) 及其证据是虚假的,我们利用土匪文学中现有的理论结果表明了这一点。由于这一问题至关重要,在过去十年里引起了人们的极大兴趣。我们发现,我们在强盗社区内十年里,我们从理论学学学界的动态中为BAA文献提供最新发展。