A common assumption in machine learning is that samples are independently and identically distributed (i.i.d). However, the contributions of different samples are not identical in training. Some samples are difficult to learn and some samples are noisy. The unequal contributions of samples has a considerable effect on training performances. Studies focusing on unequal sample contributions (e.g., easy, hard, noisy) in learning usually refer to these contributions as robust machine learning (RML). Weighing and regularization are two common techniques in RML. Numerous learning algorithms have been proposed but the strategies for dealing with easy/hard/noisy samples differ or even contradict with different learning algorithms. For example, some strategies take the hard samples first, whereas some strategies take easy first. Conducting a clear comparison for existing RML algorithms in dealing with different samples is difficult due to lack of a unified theoretical framework for RML. This study attempts to construct a mathematical foundation for RML based on the bias-variance trade-off theory. A series of definitions and properties are presented and proved. Several classical learning algorithms are also explained and compared. Improvements of existing methods are obtained based on the comparison. A unified method that combines two classical learning strategies is proposed.
翻译:机械学习的一个常见假设是,样本是独立和完全分布的(即d.d.)。然而,不同样本的贡献在培训中并不完全相同。有些样本难以学习,有些样本是吵闹的。样本的不平等贡献对培训绩效有相当大的影响。侧重于在学习中的不平等抽样贡献(例如简单、硬、吵闹)的研究通常认为这些贡献是稳健的机器学习(RML) 。在RML中,称和规范化是两种常见技术。许多学习算法已经提出,但处理简单/硬/惰性样本的战略各不相同,甚至与不同的学习算法相矛盾。例如,有些战略先用硬样本,而有些战略则先用容易。在处理不同样本时对现有RML算法进行明确的比较是困难的,因为没有统一的RML理论框架。这项研究试图根据偏差交易理论为RML构建一个数学基础。提出和证明了一系列定义和属性。一些典型的算法也作了解释和比较。一些典型的算法以比较为基础改进了现有方法。一种统一的方法将两种方法结合起来。