Modern computational models in supervised machine learning are often highly parameterized universal approximators. As such, the value of the parameters is unimportant, and only the out of sample performance is considered. On the other hand much of the literature on model estimation assumes that the parameters themselves have intrinsic value, and thus is concerned with bias and variance of parameter estimates, which may not have any simple relationship to out of sample model performance. Therefore, within supervised machine learning, heavy use is made of ridge regression (i.e., L2 regularization), which requires the the estimation of hyperparameters and can be rendered ineffective by certain model parameterizations. We introduce an objective function which we refer to as Information-Corrected Estimation (ICE) that reduces KL divergence based generalization error for supervised machine learning. ICE attempts to directly maximize a corrected likelihood function as an estimator of the KL divergence. Such an approach is proven, theoretically, to be effective for a wide class of models, with only mild regularity restrictions. Under finite sample sizes, this corrected estimation procedure is shown experimentally to lead to significant reduction in generalization error compared to maximum likelihood estimation and L2 regularization.
翻译:监督机器学习的现代计算模型往往是高度参数化的通用近似物。因此,参数的价值并不重要,只有抽样性能才考虑。另一方面,模型估计的许多文献假定参数本身具有内在价值,因此与参数估计的偏差和差异有关,这可能与抽样模型性能的偏差和差异没有任何简单关系。因此,在监督机器学习中,大量使用脊柱回归(即L2正规化),这需要估计超参数,并且可以通过某些模型参数化而变得无效。我们引入了一种客观功能,我们称之为信息校正估计(ICE),以降低基于基于KL差异的通用误差为基础,用于监督的机器学习。ICE试图直接最大限度地增加被修正的可能性功能,作为KL差异的估测器。从理论上讲,这种方法证明对广泛的模型类别有效,只有轻微的常规性限制。在有限的样本尺寸下,这一校正估计程序被实验性地显示,与最大的可能性估计和L2正规化相比,普遍误差显著减少。