We consider the problem of approximating a function in a general nonlinear subset of $L^2$, when only a weighted Monte Carlo estimate of the $L^2$-norm can be computed. Of particular interest in this setting is the concept of sample complexity, the number of sample points that are necessary to achieve a prescribed error with high probability. Reasonable worst-case bounds for this quantity exist only for particular model classes, like linear spaces or sets of sparse vectors. For more general sets, like tensor networks or neural networks, the currently existing bounds are very pessimistic. By restricting the model class to a neighbourhood of the best approximation, we can derive improved worst-case bounds for the sample complexity. When the considered neighbourhood is a manifold with positive local reach, its sample complexity can be estimated by means of the sample complexities of the tangent and normal spaces and the manifold's curvature.
翻译:我们考虑的是,当只能计算蒙特卡洛对美元-诺尔的加权估计值时,在一般非线性子集($L%2美元)中接近一个函数的问题。在这一背景下,特别感兴趣的是抽样复杂性的概念,是达到规定错误的高概率所需的抽样点数。这一数量的合理最坏的界限只适用于特定的模型类别,如线性空间或几组稀薄的矢量。对于更普通的数据集,如强力网络或神经网络,现有的界限是非常悲观的。通过将模型类限制在最接近的附近,我们可以得出更好的样品复杂性最坏的界限。当所考虑的邻近地区是具有正面的本地范围,其抽样复杂性可以通过采样复杂性、正向空间和正向空间以及流体的曲线来估计。