While post-training model compression can greatly reduce the inference cost of a deep neural network, uncompressed training still consumes a huge amount of hardware resources, run-time and energy. It is highly desirable to directly train a compact neural network from scratch with low memory and low computational cost. Low-rank tensor decomposition is one of the most effective approaches to reduce the memory and computing requirements of large-size neural networks. However, directly training a low-rank tensorized neural network is a very challenging task because it is hard to determine a proper tensor rank {\it a priori}, which controls the model complexity and compression ratio in the training process. This paper presents a novel end-to-end framework for low-rank tensorized training of neural networks. We first develop a flexible Bayesian model that can handle various low-rank tensor formats (e.g., CP, Tucker, tensor train and tensor-train matrix) that compress neural network parameters in training. This model can automatically determine the tensor ranks inside a nonlinear forward model, which is beyond the capability of existing Bayesian tensor methods. We further develop a scalable stochastic variational inference solver to estimate the posterior density of large-scale problems in training. Our work provides the first general-purpose rank-adaptive framework for end-to-end tensorized training. Our numerical results on various neural network architectures show orders-of-magnitude parameter reduction and little accuracy loss (or even better accuracy) in the training process. Specifically, on a very large deep learning recommendation system with over $4.2\times 10^9$ model parameters, our method can reduce the variables to only $1.6\times 10^5$ automatically in the training process (i.e., by $2.6\times 10^4$ times) while achieving almost the same accuracy.
翻译:训练后模式压缩可以大大降低深神经网络的发酵成本, 而未压缩的训练仍然消耗大量硬件资源、运行时间和能量。 极有必要从零开始直接培训一个小型神经网络, 低内存和低计算成本。 低声压分解是减少大型神经网络记忆和计算要求的最有效方法之一。 但是, 直接培训低声压神经网络是一项极具挑战性的任务, 因为它很难确定一个适当的 Exmor 级( iit a sister ), 它可以控制培训过程中的模型复杂程度和压缩比率。 本文展示了一个全新的神经网络从头到尾, 低记忆和低计算成本网络培训。 我们首先开发一个灵活的海湾模型, 可以处理各种低声的电压格式( 例如, CP, 塔克, 高压电压电路电路电路和电压矩阵) 。 这个模型可以自动确定一个非直线式的内线式神经系统( 5) 精度序列的精度排序, 超越了我们现有的直数级系统自动的直径直径直线路路路路路路路路路路路路路路路路路路路路路的系统 。