A crucial assumption underlying the most current theory of machine learning is that the training distribution is identical to the test distribution. However, this assumption may not hold in some real-world applications. In this paper, we develop a learning model based on principles of information theory by minimizing the worst-case loss at prescribed levels of uncertainty. We reformulate the empirical estimation of the risk functional and the distribution deviation constraint based on the importance sampling method. The objective of the proposed approach is to minimize the loss under maximum degradation and hence the resulting problem is a minimax problem which can be converted to an unconstrained minimum problem using the Lagrange method with the Lagrange multiplier $T$. We reveal that the minimization of the objective function under logarithmic transformation is equivalent to the minimization of the p-norm loss with $p=\frac{1}{T}$. We applied the proposed model to the face verification task on Racial Faces in the Wild datasets and showed that the proposed model performs better under large distribution deviations.
翻译:最新机器学习理论所依据的一个关键假设是,培训分布与测试分布完全相同。然而,这一假设可能在某些现实世界的应用中不成立。在本文中,我们根据信息理论的原则发展了一个学习模式,在规定的不确定程度上尽量减少最坏情况的损失。我们根据重要取样方法重新制定了风险的实验估计功能和分布偏差限制。拟议方法的目标是尽量减少最大退化下的损失,因此,由此产生的问题是一个小问题,可以用拉格朗乘数$T的拉格朗方法转变为一个不受限制的最低问题。我们发现,在对数转换中尽量减少目标功能相当于用$p ⁇ frac{1 ⁇ T$最大限度地减少p-norm损失。我们应用了拟议的模型来进行野生数据集中种族面部的面部核查任务,并表明拟议的模型在巨大的分布偏差下表现更好。