No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often referenced in support of the notion that individual problems require specially tailored inductive biases. While virtually all uniformly sampled datasets have high complexity, real-world problems disproportionately generate low-complexity data, and we argue that neural network models share this same preference, formalized using Kolmogorov complexity. Notably, we show that architectures designed for a particular domain, such as computer vision, can compress datasets on a variety of seemingly unrelated domains. Our experiments show that pre-trained and even randomly initialized language models prefer to generate low-complexity sequences. Whereas no free lunch theorems seemingly indicate that individual problems require specialized learners, we explain how tasks that often require human intervention such as picking an appropriately sized model when labeled data is scarce or plentiful can be automated into a single learning algorithm. These observations justify the trend in deep learning of unifying seemingly disparate problems with an increasingly small set of machine learning models.
翻译:无免费午餐定理指出,没有一个学习者能够解决所有问题,或者所有学习者在学习问题的均匀分布上的精度都是一样的。因此,这些定理经常被引用来支持特定的归纳偏差可以帮助学习。虽然几乎所有均匀采样的数据集的复杂度很高,但现实世界中的问题却不按比例地产生低复杂度的数据,我们认为神经网络模型也有这种偏好,用 Kolmogorov 复杂度来描述。值得注意的是,我们展示了为特定领域设计的架构,比如计算机视觉,可以对多种看似不相关的领域的数据集进行压缩。我们的实验表明,预训练甚至是随机初始化的语言模型更喜欢生成低复杂度的序列。虽然无免费午餐定理似乎表明每个问题需要特定的学习者,但我们解释了在标记数据稀缺或丰富时,经常需要人工干预的任务如何被自动化为一个单一的学习算法。这些观察结果证明了深度学习的趋势,即使用越来越少的机器学习模型统一看似不相关的问题。