Distribution testing is a fundamental statistical task with many applications, but we are interested in a variety of problems where systematic mislabelings of the sample prevent us from applying the existing theory. To apply distribution testing to these problems, we introduce distribution testing under the parity trace, where the algorithm receives an ordered sample $S$ that reveals only the least significant bit of each element. This abstraction reveals connections between the following three problems of interest, allowing new upper and lower bounds: 1. In distribution testing with a confused collector, the collector of the sample may be incapable of distinguishing between nearby elements of a domain (e.g. a machine learning classifier). We prove bounds for distribution testing with a confused collector on domains structured as a cycle or a path. 2. Recent work on the fundamental testing vs. learning question established tight lower bounds on distribution-free sample-based property testing by reduction from distribution testing, but the tightness is limited to symmetric properties. The parity trace allows a broader family of equivalences to non-symmetric properties, while recovering and strengthening many of the previous results with a different technique. 3. We give the first results for property testing in the well-studied trace reconstruction model, where the goal is to test whether an unknown string $x$ satisfies some property or is far from satisfying that property, given only independent random traces of $x$. Our main technical result is a tight bound of $\widetilde \Theta\left((n/\epsilon)^{4/5} + \sqrt n/\epsilon^2\right)$ for testing uniformity of distributions over $[n]$ under the parity trace, leading also to results for the problems above.
翻译:摘要:分布检验是一种基本的统计任务,具有许多应用,但我们感兴趣的是在样本存在系统的错误标记时,无法应用现有理论的各种问题。为了将分布检验应用于这些问题,我们引入了奇偶追踪下的分布检验,其中算法接收一个有序样本$S$,该样本仅显示每个元素的最低有效位。这种抽象模型揭示了以下三个问题之间的联系,并允许新的上限和下限:1.在具有混淆收集器的分布检验中,样本的收集器可能无法区分域中相邻的元素(例如机器学习分类器)。我们证明了在环形或路径结构域上进行有混淆收集器的分布检测的界限。2.关于基本的测试与学习问题的最近工作通过从分布检验进行缩减来建立分布自由的基于样本的性质测试的紧密下界,但紧密程度仅适用于对称性质。奇偶追踪允许更广泛的对称到非对称性质的等价性家族,并使用不同的技术恢复和加强了许多先前结果。3.我们提供了在广为研究的迹重建模型下进行性质检验的第一个结果,其中仅在$x$的独立随机迹的情况下检验是否满足某个性质或远离该性质。我们的主要技术结果是关于在奇偶追踪下测试分布在$[n]$上均匀性的紧密界限$\widetilde \Theta\left((n/\epsilon)^{4/5} + \sqrt n/\epsilon^2\right)$,也导致了上述问题的结果。