Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard supervised learning frameworks, whose usage can require specialized knowledge about two-sample testing. We use a simple test that takes the mean discrepancy of a witness function as the test statistic and prove that minimizing a squared loss leads to a witness with optimal testing power. This allows us to leverage recent advancements in AutoML. Without any user input about the problems at hand, and using the same method for all our experiments, our AutoML two-sample test achieves competitive performance on a diverse distribution shift benchmark as well as on challenging two-sample testing problems. We provide an implementation of the AutoML two-sample test in the Python package autotst.
翻译:两次抽样测试在统计和机器学习中很重要,既是科学发现的工具,也是检测分布变化的工具。这导致开发了许多超越标准监督的学习框架的复杂测试程序,其使用需要关于两类测试的专业知识。我们使用一个简单的测试,将证人功能的明显差异作为测试统计数据,并证明将平方损失最小化会导致具有最佳测试能力的证人。这使我们能够利用AutomL(AutoML)中的最新进展。在没有用户对手头问题的任何投入的情况下,并在我们所有实验中使用同样的方法,我们的AutomtML(AutoML)二类样本测试在不同的分布转移基准上取得了竞争性业绩,并在挑战两类测试问题上取得了竞争力。我们在Python软件包自动进行自动应用AutimmL(Automil ) 两类样本测试。