Kernel based methods including Gaussian process regression (GPR) and generally kernel ridge regression (KRR) have been finding increasing use in computational chemistry, including the fitting of potential energy surfaces and density functionals in high-dimensional feature spaces. Kernels of the Matern family such as Gaussian-like kernels (basis functions) are often used, which allows imparting them the meaning of covariance functions and formulating GPR as an estimator of the mean of a Gaussian distribution. The notion of locality of the kernel is critical for this interpretation. It is also critical to the formulation of multi-zeta type basis functions widely used in computational chemistry We show, on the example of fitting of molecular potential energy surfaces of increasing dimensionality, the practical disappearance of the property of locality of a Gaussian-like kernel in high dimensionality. We also formulate a multi-zeta approach to the kernel and show that it significantly improves the quality of regression in low dimensionality but loses any advantage in high dimensionality, which is attributed to the loss of the property of locality.
翻译:以内核为基础的方法,包括高斯进程回归(GPR)和一般内核脊回归(KRR)等基于内核的方法,在计算化学中日益得到使用,包括在高维特征空间中安装潜在的能源表面和密度功能; 经常使用诸如高斯类内核(Basisy)等母体家庭的内核(核心功能),这样可以赋予它们共变功能的含义,并形成GPR作为高斯分布平均值的估测器; 内核的位置概念对于这一解释至关重要; 这对于制定在计算化学中广泛使用的多Zeta型功能也至关重要; 我们以分子潜在能源表面的装配为例子,显示高维度中高斯类内核的地产实际上消失。 我们还对内核的分布制定了一种多星系方法,表明它大大改进了低维度回归的质量,但在高维度中丧失了任何优势,可归因于地貌损失的分子潜在能源表面。