Machine learning often needs to model density from a multidimensional data sample, including correlations between coordinates. Additionally, we often have missing data case: that data points can miss values for some of coordinates. This article adapts rapid parametric density estimation approach for this purpose: modelling density as a linear combination of orthonormal functions, for which $L^2$ optimization says that (independently) estimated coefficient for a given function is just average over the sample of value of this function. Hierarchical correlation reconstruction first models probability density for each separate coordinate using all its appearances in data sample, then adds corrections from independently modelled pairwise correlations using all samples having both coordinates, and so on independently adding correlations for growing numbers of variables using often decreasing evidence in data sample. A basic application of such modelled multidimensional density can be imputation of missing coordinates: by inserting known coordinates to the density, and taking expected values for the missing coordinates, or even their entire joint probability distribution. Presented method can be compared with cascade correlations approach, offering several advantages in flexibility and accuracy. It can be also used as artificial neuron: maximizing prediction capabilities for only local behavior - modelling and predicting local connections.
翻译:机器学习通常需要从多维数据样本中模拟密度,包括坐标之间的关联。 此外,我们常常缺少数据案例:数据点可能错失某些坐标的值。本条款为此调整快速参数密度估计方法:将密度建模成正统函数的线性组合,为此,$L2$优化表示,特定函数的估计系数(单独)仅高于该函数的值样本的平均值。等级相关重建第一模型的概率密度,利用数据样本中的所有外观进行每个单独的协调,然后添加独立模拟的对等相关关系校正,使用所有具有坐标的样本进行校正,从而独立增加变量数量的关联性,同时在数据样本中经常减少证据。这种模型多维密度的基本应用可以是估计缺失坐标:插入已知的坐标,并采用缺失坐标的预期值,甚至全部联合概率分布。目前的方法可以与串联关联方法进行比较,在灵活性和准确性方面提供若干优势。还可以用作人工神经元:仅对本地行为进行最大程度的预测能力进行建模和预测。