Machine learning (ML) formalizes the problem of getting computers to learn from experience as optimization of performance according to some metric(s) on a set of data examples. This is in contrast to requiring behaviour specified in advance (e.g. by hard-coded rules). Formalization of this problem has enabled great progress in many applications with large real-world impact, including translation, speech recognition, self-driving cars, and drug discovery. But practical instantiations of this formalism make many assumptions - for example, that data are i.i.d.: independent and identically distributed - whose soundness is seldom investigated. And in making great progress in such a short time, the field has developed many norms and ad-hoc standards, focused on a relatively small range of problem settings. As applications of ML, particularly in artificial intelligence (AI) systems, become more pervasive in the real world, we need to critically examine these assumptions, norms, and problem settings, as well as the methods that have become de-facto standards. There is much we still do not understand about how and why deep networks trained with stochastic gradient descent are able to generalize as well as they do, why they fail when they do, and how they will perform on out-of-distribution data. In this thesis I cover some of my work towards better understanding deep net generalization, identify several ways assumptions and problem settings fail to generalize to the real world, and propose ways to address those failures in practice.
翻译:机械学习(ML) 正式解决了让计算机从经验中学习如何优化业绩的问题,这是根据一组数据实例中的一些衡量标准,这与要求预先规定的行为(如硬编码规则)形成对照。 这一问题的正规化使许多应用程序取得了巨大进展,产生了巨大的现实世界影响,包括翻译、语音识别、自行驾驶汽车和药物发现。 但是,这种形式主义的实际回馈使得许多假设 — — 例如,数据是i.d:独立和同样分布的 — — 其正确性很少被调查。在如此短的时间内,在取得巨大进展方面,实地已经制定了许多规范和临时标准,侧重于相对较少的问题设置。随着ML的应用,特别是人工智能(AI)系统的应用在现实世界中越来越普遍,我们需要严格地研究这些假设、规范和问题环境,以及已经变得非事实化的方法。 我们仍不理解它们是如何和为什么深层次的网络化的网络在这种深度的梯度下层上被训练的,在如此短短的时间里,它们会制定许多规范和临时标准, 侧重于相对较少的问题。