在机器学习中,使用基于梯度的学习方法和反向传播训练人工神经网络时,会遇到梯度消失的问题。在这种方法中,每个神经网络的权值在每次迭代训练时都得到一个与误差函数对当前权值的偏导数成比例的更新。问题是,在某些情况下,梯度会极小,有效地阻止权值的改变。在最坏的情况下,这可能会完全阻止神经网络进一步的训练。作为问题原因的一个例子,传统的激活函数,如双曲正切函数的梯度在范围(0,1),而反向传播通过链式法则计算梯度。这样做的效果是将n个这些小数字相乘来计算n层网络中“前端”层的梯度,这意味着梯度(误差信号)随着n的增加呈指数递减,而前端层的训练非常缓慢。

最新论文

Humans express their opinions and emotions through multiple modalities which mainly consist of textual, acoustic and visual modalities. Prior works on multimodal sentiment analysis mostly apply Recurrent Neural Network (RNN) to model aligned multimodal sequences. However, it is unpractical to align multimodal sequences due to different sample rates for different modalities. Moreover, RNN is prone to the issues of gradient vanishing or exploding and it has limited capacity of learning long-range dependency which is the major obstacle to model unaligned multimodal sequences. In this paper, we introduce Graph Capsule Aggregation (GraphCAGE) to model unaligned multimodal sequences with graph-based neural model and Capsule Network. By converting sequence data into graph, the previously mentioned problems of RNN are avoided. In addition, the aggregation capability of Capsule Network and the graph-based structure enable our model to be interpretable and better solve the problem of long-range dependency. Experimental results suggest that GraphCAGE achieves state-of-the-art performance on two benchmark datasets with representations refined by Capsule Network and interpretation provided.

0
0
下载
预览
Top