Data sites selected from modeling high-dimensional problems often appear scattered in non-paternalistic ways. Except for sporadic clustering at some spots, they become relatively far apart as the dimension of the ambient space grows. These features defy any theoretical treatment that requires local or global quasi-uniformity of distribution of data sites. Incorporating a recently-developed application of integral operator theory in machine learning, we propose and study in the current article a new framework to analyze kernel interpolation of high dimensional data, which features bounding stochastic approximation error by the spectrum of the underlying kernel matrix. Both theoretical analysis and numerical simulations show that spectra of kernel matrices are reliable and stable barometers for gauging the performance of kernel-interpolation methods for high dimensional data.
翻译:从高维问题建模中选定的数据站点往往以非双向方式分散。除了在某些地点的零星集聚外,随着环境空间的维度的扩大,这些点点点相对相去甚远。这些特征与任何需要当地或全球数据站分布准一致的理论处理相悖。在机器学习中采用最近开发的集成操作者理论,我们在目前的条款中提出并研究一个新的框架来分析高维数据的内圈内插,高维数据的内插因内插差因内在内核矩阵的频谱而使随机近似误差相互捆绑。理论分析和数字模拟都表明,内核矩阵的光谱是可靠和稳定的气压计,用以测量高维数据内核内核内插法的性能。