We consider the regression problem of estimating functions on $\mathbb{R}^D$ but supported on a $d$-dimensional manifold $ \mathcal{M} \subset \mathbb{R}^D $ with $ d \ll D $. Drawing ideas from multi-resolution analysis and nonlinear approximation, we construct low-dimensional coordinates on $\mathcal{M}$ at multiple scales, and perform multiscale regression by local polynomial fitting. We propose a data-driven wavelet thresholding scheme that automatically adapts to the unknown regularity of the function, allowing for efficient estimation of functions exhibiting nonuniform regularity at different locations and scales. We analyze the generalization error of our method by proving finite sample bounds in high probability on rich classes of priors. Our estimator attains optimal learning rates (up to logarithmic factors) as if the function was defined on a known Euclidean domain of dimension $d$, instead of an unknown manifold embedded in $\mathbb{R}^D$. The implemented algorithm has quasilinear complexity in the sample size, with constants linear in $D$ and exponential in $d$. Our work therefore establishes a new framework for regression on low-dimensional sets embedded in high dimensions, with fast implementation and strong theoretical guarantees.
翻译:我们考虑估算$mathbb{R ⁇ D$的函数的回归问题,但以美元元元元值 $\ mathcal{M}\ subset\ subset\ mathbb{R ⁇ D$ 美元支持 $ d\ ll D$ 。我们从多分辨率分析和非线性近似中提取想法,在多个尺度上构建$mathcal{M}$的低维度坐标,并用本地多边装配进行多维度回归。我们建议了一个数据驱动的波盘阈值阈值计划,该程序将自动适应该功能的未知规律性,从而能够有效地估算不同地点和尺度上显示非统一性常规性的函数。我们分析我们方法的概括错误,方法是在丰富前几类的概率中以高度的概率来证明有限的样本。我们估计器在多个尺度上取得了最佳的学习率(直至对逻辑系数),就像该函数被定义在已知的Euclidedean域值 $d ${R ⁇ D$。 执行的准线性算算法在高级框架中,因此,在快速的精确的精确度上确定了我们的精确度。