在反向传播过程中需要对激活han函数进行求导,如果导数大于1,那么随着网络层数的增加梯度更新将会朝着指数爆炸的方式增加这就是梯度爆炸。同样如果导数小于1,那么随着网络层数的增加梯度更新信息会朝着指数衰减的方式减少这就是梯度消失。

热门内容

Tensor Train (TT) approach has been successfully applied in the modelling of the multilinear interaction of features. Nevertheless, the existing models lack flexibility and generalizability, as they only model a single type of high-order correlation. In practice, multiple multilinear correlations may exist within the features. In this paper, we present a novel Residual Tensor Train (ResTT) which integrates the merits of TT and residual structure to capture the multilinear feature correlations, from low to higher orders, within the same model. In particular, we prove that the fully-connected layer in neural networks and the Volterra series can be taken as special cases of ResTT. Furthermore, we derive the rule for weight initialization that stabilizes the training of ResTT based on a mean-field analysis. We prove that such a rule is much more relaxed than that of TT, which means ResTT can easily address the vanishing and exploding gradient problem that exists in the current TT models. Numerical experiments demonstrate that ResTT outperforms the state-of-the-art tensor network approaches, and is competitive with the benchmark deep learning models on MNIST and Fashion-MNIST datasets.

0
0
下载
预览

最新内容

Tensor Train (TT) approach has been successfully applied in the modelling of the multilinear interaction of features. Nevertheless, the existing models lack flexibility and generalizability, as they only model a single type of high-order correlation. In practice, multiple multilinear correlations may exist within the features. In this paper, we present a novel Residual Tensor Train (ResTT) which integrates the merits of TT and residual structure to capture the multilinear feature correlations, from low to higher orders, within the same model. In particular, we prove that the fully-connected layer in neural networks and the Volterra series can be taken as special cases of ResTT. Furthermore, we derive the rule for weight initialization that stabilizes the training of ResTT based on a mean-field analysis. We prove that such a rule is much more relaxed than that of TT, which means ResTT can easily address the vanishing and exploding gradient problem that exists in the current TT models. Numerical experiments demonstrate that ResTT outperforms the state-of-the-art tensor network approaches, and is competitive with the benchmark deep learning models on MNIST and Fashion-MNIST datasets.

0
0
下载
预览

最新论文

Tensor Train (TT) approach has been successfully applied in the modelling of the multilinear interaction of features. Nevertheless, the existing models lack flexibility and generalizability, as they only model a single type of high-order correlation. In practice, multiple multilinear correlations may exist within the features. In this paper, we present a novel Residual Tensor Train (ResTT) which integrates the merits of TT and residual structure to capture the multilinear feature correlations, from low to higher orders, within the same model. In particular, we prove that the fully-connected layer in neural networks and the Volterra series can be taken as special cases of ResTT. Furthermore, we derive the rule for weight initialization that stabilizes the training of ResTT based on a mean-field analysis. We prove that such a rule is much more relaxed than that of TT, which means ResTT can easily address the vanishing and exploding gradient problem that exists in the current TT models. Numerical experiments demonstrate that ResTT outperforms the state-of-the-art tensor network approaches, and is competitive with the benchmark deep learning models on MNIST and Fashion-MNIST datasets.

0
0
下载
预览
Top