可测试学习中成员查询的局限性 (Limitations of Membership Queries in Testable Learning)

Membership queries (MQ) often yield speedups for learning tasks, particularly in the distribution-specific setting. We show that in the \emph{testable learning} model of Rubinfeld and Vasilyan [RV23], membership queries cannot decrease the time complexity of testable learning algorithms beyond the complexity of sample-only distribution-specific learning. In the testable learning model, the learner must output a hypothesis whenever the data distribution satisfies a desired property, and if it outputs a hypothesis, the hypothesis must be near-optimal. We give a general reduction from sample-based \emph{refutation} of boolean concept classes, as presented in [Vadhan17, KL18], to testable learning with queries (TL-Q). This yields lower bounds for TL-Q via the reduction from learning to refutation given in [KL18]. The result is that, relative to a concept class and a distribution family, no $m$-sample TL-Q algorithm can be super-polynomially more time-efficient than the best $m$-sample PAC learner. Finally, we define a class of ``statistical'' MQ algorithms that encompasses many known distribution-specific MQ learners, such as those based on influence estimation or subcube-conditional statistical queries. We show that TL-Q algorithms in this class imply efficient statistical-query refutation and learning algorithms. Thus, combined with known SQ dimension lower bounds, our results imply that these efficient membership query learners cannot be made testable.

翻译：成员查询（MQ）通常能加速学习任务，尤其在分布特定的设定中。我们证明，在Rubinfeld和Vasilyan [RV23]提出的可测试学习模型中，成员查询无法将可测试学习算法的时间复杂度降低至超越仅基于样本的分布特定学习的复杂度。在可测试学习模型中，当数据分布满足特定性质时，学习者必须输出一个假设；若输出假设，该假设必须接近最优。我们提出了一种从布尔概念类的基于样本的证伪（如[Vadhan17, KL18]所述）到带查询的可测试学习（TL-Q）的通用归约。通过[KL18]中给出的从学习到证伪的归约，这为TL-Q提供了下界。结果表明，相对于一个概念类和一个分布族，任何基于m个样本的TL-Q算法在时间效率上不可能比最优的m样本PAC学习器具有超多项式优势。最后，我们定义了一类“统计型”MQ算法，涵盖了许多已知的分布特定MQ学习器，例如基于影响估计或子立方条件统计查询的算法。我们证明，此类TL-Q算法隐含了高效的统计查询证伪与学习算法。因此，结合已知的SQ维度下界，我们的结果表明这些高效的成员查询学习器无法被改造为可测试的。