We consider the problem of determining which classes of functions can be tested more efficiently than they can be learned, in the distribution-free sample-based model that corresponds to the standard PAC learning setting. Our main result shows that while VC dimension by itself does not always provide tight bounds on the number of samples required to test a class of functions in this model, it can be combined with a closely-related variant that we call "lower VC" (or LVC) dimension to obtain strong lower bounds on this sample complexity. We use this result to obtain strong and in many cases nearly optimal lower bounds on the sample complexity for testing unions of intervals, halfspaces, intersections of halfspaces, polynomial threshold functions, and decision trees. Conversely, we show that two natural classes of functions, juntas and monotone functions, can be tested with a number of samples that is polynomially smaller than the number of samples required for PAC learning. Finally, we also use the connection between VC dimension and property testing to establish new lower bounds for testing radius clusterability and testing feasibility of linear constraint systems.
翻译:我们考虑了在与标准PAC学习环境相对应的无分布式样本模型中确定哪些类别的功能可以比所学得更高效地测试的问题。我们的主要结果表明,虽然VC层面本身并不总能对测试该模型中某类功能所需的样本数量提供严格的限制,但它可以与一个密切相关的变量相结合,我们称之为“低VC”(或LVC)层面,以获得关于这一样本复杂性的更强的下限。我们利用这一结果在样本复杂性方面获得了强力,在许多情况下,在样本复杂性方面获得了几乎最佳的较低界限,以测试间隔、半空、半空的交叉点、多数值阈值功能和决策树。相反,我们表明,两种自然的功能类别,即军政府军和单体内功能,可以与一些比PAC学习所需的样本数量多得多的样本进行测试。最后,我们还利用VC层面与财产测试之间的联系,为测试半径集性和线性约束系统的可行性设定新的较低界限。