蒙特卡洛是不是在高层次学习平稳功能的不良抽样战略? (Is Monte Carlo a bad sampling strategy for learning smooth functions in high dimensions?)

This paper concerns the approximation of smooth, high-dimensional functions from limited samples using polynomials. This task lies at the heart of many applications in computational science and engineering -- notably, those arising from parametric modelling and uncertainty quantification. It is common to use Monte Carlo (MC) sampling in such applications, so as not to succumb to the curse of dimensionality. However, it is well known this strategy is theoretically suboptimal. There are many polynomial spaces of dimension $n$ for which the sample complexity scales log-quadratically in $n$. This well-documented phenomenon has led to a concerted effort to design improved, in fact, near-optimal strategies, whose sample complexities scale log-linearly, or even linearly in $n$. Paradoxically, in this work we show that MC is actually a perfectly good strategy in high dimensions. We first document this phenomenon via several numerical examples. Next, we present a theoretical analysis that resolves this paradox for holomorphic functions of infinitely-many variables. We show that there is a least-squares scheme based on $m$ MC samples whose error decays algebraically fast in $m/\log(m)$, with a rate that is the same as that of the best $n$-term polynomial approximation. This result is non-constructive, since it assumes knowledge of a suitable polynomial space in which to perform the approximation. We next present a compressed sensing-based scheme that achieves the same rate, except for a larger polylogarithmic factor. This scheme is practical, and numerically it performs as well as or better than well-known adaptive least-squares schemes. Overall, our findings demonstrate that MC sampling is eminently suitable for smooth function approximation when the dimension is sufficiently high. Hence the benefits of improved sampling strategies are generically limited to lower-dimensional settings.

翻译：本文涉及使用多元分子的有限样本中光滑高维功能的近似值。这个任务位于计算学和工程学中许多应用应用的核心 -- -- 主要是参数建模和不确定性量化所产生的应用。在这种应用中使用蒙特卡洛(MC)取样方法很常见, 以免屈服于维度的诅咒。然而, 众所周知, 这个策略在理论上是不完美的。有很多维度的多数值空间。有许多维度的样本复杂度以美元为单位, 逻辑平方。这个记录良好的现象导致在计算学和工程学中设计许多应用的核心 -- -- 事实上, 近优美的策略, 其样本的精度缩略线性, 甚至是线性化的。在这项工作中, 模型的精度是完全好的策略。我们首先通过几个数字实例来记录这个现象。我们提出一个理论分析, 以无限多维变量的本性功能解决了这个反常态现象。我们指出, 下一个最差的精确的精确度计划是最差的, 以美元的直径直径的内, 的直径直径直系的直径直系方案, 也就是的直径直系的直系的直系的直系的直系的直系的直系, 直系的直系的直系的直系的直系的直系的直系, 直系的直系, 直系的直系的直系, 直系的直系的直系的直系的直系的直系的直系的直系, 直系的直系的直系的直系的直系的直系的直系的直系的直系的直系, 直系的直系的直系的直系的直系的直系的直系的直系的直系的直系的直系的直系的直系的直系的直系的直系, 直系的直系的直系, 直系的直系的直系的直系的直系的直系的直系的直系的直系的直系的直系的直系的直系的直系的直系,其直系,其直系的直系的直系的直系的直系的直系的直系的直系的直系的直系,其直系