We present a Python implementation for RS-HDMR-GPR (Random Sampling High Dimensional Model Representation Gaussian Process Regression). The method builds representations of multivariate functions with lower-dimensional terms, either as an expansion over orders of coupling or using terms of only a given dimensionality. This facilitates, in particular, recovering functional dependence from sparse data. The code also allows for imputation of missing values of the variables and for a significant pruning of the useful number of HDMR terms. The code can also be used for estimating relative importance of different combinations of input variables, thereby adding an element of insight to a general machine learning method. The capabilities of this regression tool are demonstrated on test cases involving synthetic analytic functions, the potential energy surface of the water molecule, kinetic energy densities of materials (crystalline magnesium, aluminum, and silicon), and financial market data.
翻译:我们为RS-HDMR-GPR(Random抽样高维模型代表高斯进程回归)展示了Python的功能。该方法以低维术语构建多变量功能的表达方式,要么扩展于组合顺序,要么仅使用特定维度的术语。这特别有利于从稀少的数据中恢复功能依赖性。该代码还允许对变量的缺失值进行估算,并对高频MR术语的有用数量进行大幅剪切分。该代码还可以用于估计不同输入变量组合的相对重要性,从而给一般机器学习方法增添一个洞察力要素。这一回归工具的能力在涉及合成分析功能、水分子的潜在能量表面、材料的动能密度(丙烯镁、铝和硅)和金融市场数据的测试案例中得到了证明。