Tensorial Convolutional Neural Networks (TCNNs) have attracted much research attention for their power in reducing model parameters or enhancing the generalization ability. However, exploration of TCNNs is hindered even from weight initialization methods. To be specific, general initialization methods, such as Xavier or Kaiming initialization, usually fail to generate appropriate weights for TCNNs. Meanwhile, although there are ad-hoc approaches for specific architectures (e.g., Tensor Ring Nets), they are not applicable to TCNNs with other tensor decomposition methods (e.g., CP or Tucker decomposition). To address this problem, we propose a universal weight initialization paradigm, which generalizes Xavier and Kaiming methods and can be widely applicable to arbitrary TCNNs. Specifically, we first present the Reproducing Transformation to convert the backward process in TCNNs to an equivalent convolution process. Then, based on the convolution operators in the forward and backward processes, we build a unified paradigm to control the variance of features and gradients in TCNNs. Thus, we can derive fan-in and fan-out initialization for various TCNNs. We demonstrate that our paradigm can stabilize the training of TCNNs, leading to faster convergence and better results.
翻译:在减少模型参数或提高一般化能力方面,TCNN的探索甚至受到重量初始化方法的阻碍。具体而言,Xavier或Kaiming初始化等一般初始化方法通常不能产生对TCNN的适当加权。与此同时,尽管对具体结构(例如Tensor Ring Nets)有特殊的方法,但它们在减少模型参数或提高一般化能力方面的力量引起了许多研究关注。然而,对TCNN的探索甚至受到重量初始化方法的阻碍。为了解决这一问题,我们提出了一个普遍加权初始化模式,该模式将Xavier和Kaiming方法普遍化,并可以广泛适用于任意的TCNNs。具体地说,我们首先介绍将TCNN的落后进程转换为类似的革命进程的再生转型。然后,根据前向和后向进程中的变异体操作者,我们用统一的模式来控制TCNN的特征和梯变异性,这样我们就可以对TCN的初始化和先导式进行更快速的趋同。因此,我们能够对TCN的趋同式进行更快速的趋同。