We consider the problem of approximating a function in general nonlinear subsets of L2 when only a weighted Monte Carlo estimate of the L2-norm can be computed. Of particular interest in this setting is the concept of sample complexity, the number of sample points that are necessary to achieve a prescribed error with high probability. Reasonable worst-case bounds for this quantity exist only for particular subsets of L2, like linear spaces or sets of sparse vectors. For more general subsets, like tensor networks, the currently existing bounds are very pessimistic. By restricting the model class to a neighbourhood of the best approximation, we can derive improved worst-case bounds for the sample complexity. When the considered neighbourhood is a manifold with positive local reach, the sample complexity can be estimated by the sample complexity of the tangent space and the product of the sample complexity of the normal space and the manifold's curvature.
翻译:我们考虑的是,在L2一般非线性子子集中,如果只能计算对L2-norm的Monte Carlo加权估计L2-norm时,约近一个函数的问题。在这一背景下,特别感兴趣的是样本复杂性的概念,是达到规定错误的高概率所需的样本点数。这一数量的最坏范围只适用于特定的L2子集,如线性空间或几组稀有矢量。对于更普通的子集,如高频网络,现有界限非常悲观。通过将模型类限制在最接近的邻近地区,我们可以得出最坏的样本复杂性界限。当所考虑的邻近地区是具有正面的本地范围的一个多处时,样本复杂性可以根据采样的复杂程度以及正常空间的样本复杂性和流体的曲线来估计。