Machine learning often needs to model density from a multidimensional data sample, including correlations between coordinates. Additionally, we often have missing data case: that data points can miss values for some of coordinates. This article adapts rapid parametric density estimation approach for this purpose: modelling density as a linear combination of orthonormal functions, for which $L^2$ optimization says that (independently) estimated coefficient for a given function is just average over the sample of value of this function. Hierarchical correlation reconstruction first models probability density for each separate coordinate using all its appearances in data sample, then adds corrections from independently modelled pairwise correlations using all samples having both coordinates, and so on independently adding correlations for growing numbers of variables using decreasing evidence in our data sample. A basic application of such modelled multidimensional density can be imputation of missing coordinates: by inserting known coordinates to the density, and taking expected values for the missing coordinates, and maybe also variance to estimate their uncertainty. Presented method can be compared with cascade correlations approach, offering several advantages in flexibility and accuracy. It can be also used as artificial neuron: maximizing prediction capabilities for only local behavior - modelling and predicting local connections.
翻译:机器学习通常需要从多维数据样本中建模密度,包括坐标之间的关联。 此外,我们常常缺少数据案例:数据点可能错失某些坐标的值。本条款为此调整快速参数密度估计方法:将密度建模成正统函数的线性组合,为此,$L2$优化表示,特定函数的估计系数(单独)仅高于该函数的值样本的平均值。等级相关重建第一模型的概率密度,利用数据样本中的所有外观进行每个单独的协调,然后添加独立模型化的对等对应关系校正,同时使用所有具有坐标的样本进行校正,从而独立增加变量数量的关联性,从而利用数据样本中不断减少的证据独立增加变量。这种模型多维密度的基本应用可以是估计缺失坐标:将已知的坐标插入已知坐标,并采用缺失坐标的预期值,以及估计其不确定性。现有方法可以与级联相关性方法进行比较,在灵活性和准确性方面提供若干优势。还可以用作人造神经元:仅对本地行为进行最大程度的预测能力进行建模和预测。