Advanced tensor decomposition, such as tensor train (TT), has been widely studied for tensor decomposition-based neural network (NN) training, which is one of the most common model compression methods. However, training NN with tensor decomposition always suffers significant accuracy loss and convergence issues. In this paper, a holistic framework is proposed for tensor decomposition-based NN training by formulating TT decomposition-based NN training as a nonconvex optimization problem. This problem can be solved by the proposed tensor block coordinate descent (tenBCD) method, which is a gradient-free algorithm. The global convergence of tenBCD to a critical point at a rate of O(1/k) is established with the Kurdyka {\L}ojasiewicz (K{\L}) property, where k is the number of iterations. The theoretical results can be extended to the popular residual neural networks (ResNets). The effectiveness and efficiency of our proposed framework are verified through an image classification dataset, where our proposed method can converge efficiently in training and prevent overfitting.
翻译:高压列车(TT)等先进的高压分解器分解技术已经广泛研究过,以进行高压分解神经网络(NNN)培训,这是最常见的模型压缩方法之一;然而,对NNE进行高压分解培训,总是会遇到相当的精度损失和趋同问题;在本文件中,通过将TT分解法的NNN培训作为一种非电离层优化问题,为以高压分解法为基础的NNNE培训提出了一个整体框架。这个问题可以通过拟议的高压区块协调下降法(tenBCD)来解决,这是一种无梯度的算法。十BCD以O(1/k)的速度在全球达到临界点,与Kurdyka ~Ljusiewicz(K_L})财产相趋同,而K@L}是迭代数。理论结果可以推广到流行的残余神经网络(ResNets)。我们提议的框架的有效性和效率可以通过图像分类数据集加以验证,我们提议的方法是在培训和防止过度使用。</s>