强化深层学习法 (Scaling Laws for Deep Learning)

Running faster will only get you so far -- it is generally advisable to first understand where the roads lead, then get a car ... The renaissance of machine learning (ML) and deep learning (DL) over the last decade is accompanied by an unscalable computational cost, limiting its advancement and weighing on the field in practice. In this thesis we take a systematic approach to address the algorithmic and methodological limitations at the root of these costs. We first demonstrate that DL training and pruning are predictable and governed by scaling laws -- for state of the art models and tasks, spanning image classification and language modeling, as well as for state of the art model compression via iterative pruning. Predictability, via the establishment of these scaling laws, provides the path for principled design and trade-off reasoning, currently largely lacking in the field. We then continue to analyze the sources of the scaling laws, offering an approximation-theoretic view and showing through the exploration of a noiseless realizable case that DL is in fact dominated by error sources very far from the lower error limit. We conclude by building on the gained theoretical understanding of the scaling laws' origins. We present a conjectural path to eliminate one of the current dominant error sources -- through a data bandwidth limiting hypothesis and the introduction of Nyquist learners -- which can, in principle, reach the generalization error lower limit (e.g. 0 in the noiseless case), at finite dataset size.

翻译：跑得越快,你就会越走越快 -- -- 一般来说,首先最好首先了解道路走向何方,然后买车.在过去十年里机器学习(ML)和深层次学习(DL)的复兴伴随着无法缩放的计算成本,限制其进步和在实践中对实地的权衡。在这个理论中,我们采取系统的方法,从这些成本的根源出发,解决算法和方法上的限制。我们首先证明DL的培训和裁剪是可预测的,受比例法的制约 -- -- 艺术模型和任务的状况,图像分类和语言模型的跨越,以及通过迭代调压的艺术模型压缩状态。通过制定这些缩放法,可预测性为原则性设计和交易推理提供了路径,目前基本上在外地缺乏这种路径。然后我们继续分析比例法的来源,提供近似理论的噪音观,并通过探索一个无噪音、可实现真实化的案例,即DL实际上由离低度的错误源所主宰,差远。我们的结论是,通过在对缩缩放法的理论理解中获得的理论性理解,从而缩小了当前测序的测距。我们提出了一条测谎的路径。我们提出了一条测距,从一个测距。