Though deep learning models have taken on commercial and political relevance, many aspects of their training and operation remain poorly understood. This has sparked interest in "science of deep learning" projects, many of which are run at scale and require enormous amounts of time, money, and electricity. But how much of this research really needs to occur at scale? In this paper, we introduce MNIST-1D: a minimalist, low-memory, and low-compute alternative to classic deep learning benchmarks. The training examples are 20 times smaller than MNIST examples yet they differentiate more clearly between linear, nonlinear, and convolutional models which attain 32, 68, and 94% accuracy respectively (these models obtain 94, 99+, and 99+% on MNIST). Then we present example use cases which include measuring the spatial inductive biases of lottery tickets, observing deep double descent, and metalearning an activation function.
翻译:尽管深层次的学习模式具有商业和政治相关性,但其培训和操作的许多方面仍然没有得到很好的理解。这引起了人们对“深层次学习科学”项目的兴趣,其中许多项目是规模化的,需要大量的时间、金钱和电力。但这种研究到底需要多少规模化呢? 在本文中,我们引入了MNIST-1D:一个最微小的、低的和低的替代传统深层次学习基准。 培训实例比MNIST实例小20倍,但是它们更明确地区分线性、非线性以及共进式模型,它们分别达到32、68和94%的精确度(这些模型获得了94、99+和99++ 的MNIST ) 。 然后我们举了实例,其中包括测量彩票的空间诱导偏,观察深度的双向下行,以及金属生成激活功能。