循环神经网络(RNN)是一类人工神经网络,其中节点之间的连接沿时间序列形成有向图。 这使其表现出时间动态行为。 RNN源自前馈神经网络,可以使用其内部状态(内存)来处理可变长度的输入序列。这使得它们适用于诸如未分段的,连接的手写识别或语音识别之类的任务。

VIP内容

协同演化的时间序列出现在环境监测、金融分析、智能交通等众多应用中。本文旨在解决以下挑战:(1)如何加入时间序列的显性关系网络;(2)如何模拟时间动态的隐性关系。同时,作者提出了一个新的模型,称为张量时间序列网络,它由两个模块组成:张量图卷积网络(TGCN)和张量循环神经网络(TRNN)。TGCN通过将平面图的图卷积网络(GCN)泛化到张量图中来解决第一个挑战,它抓住了与张量相关的多个图之间的协同作用。TRNN利用张量分解来模拟共同演化的时间序列之间的隐性关系。在五个真实世界数据集上的实验结果证明了所提出的方法的有效性。

https://www.zhuanzhi.ai/paper/5e20a637eb1e0ed8aa9bb03dceecb198

成为VIP会员查看完整内容
0
17

最新内容

The use of multiple and semantically correlated sources can provide complementary information to each other that may not be evident when working with individual modalities on their own. In this context, multi-modal models can help producing more accurate and robust predictions in machine learning tasks where audio-visual data is available. This paper presents a multi-modal model for automatic scene classification that exploits simultaneously auditory and visual information. The proposed approach makes use of two separate networks which are respectively trained in isolation on audio and visual data, so that each network specializes in a given modality. The visual subnetwork is a pre-trained VGG16 model followed by a bidiretional recurrent layer, while the residual audio subnetwork is based on stacked squeeze-excitation convolutional blocks trained from scratch. After training each subnetwork, the fusion of information from the audio and visual streams is performed at two different stages. The early fusion stage combines features resulting from the last convolutional block of the respective subnetworks at different time steps to feed a bidirectional recurrent structure. The late fusion stage combines the output of the early fusion stage with the independent predictions provided by the two subnetworks, resulting in the final prediction. We evaluate the method using the recently published TAU Audio-Visual Urban Scenes 2021, which contains synchronized audio and video recordings from 12 European cities in 10 different scene classes. The proposed model has been shown to provide an excellent trade-off between prediction performance (86.5%) and system complexity (15M parameters) in the evaluation results of the DCASE 2021 Challenge.

0
0
下载
预览

最新论文

The use of multiple and semantically correlated sources can provide complementary information to each other that may not be evident when working with individual modalities on their own. In this context, multi-modal models can help producing more accurate and robust predictions in machine learning tasks where audio-visual data is available. This paper presents a multi-modal model for automatic scene classification that exploits simultaneously auditory and visual information. The proposed approach makes use of two separate networks which are respectively trained in isolation on audio and visual data, so that each network specializes in a given modality. The visual subnetwork is a pre-trained VGG16 model followed by a bidiretional recurrent layer, while the residual audio subnetwork is based on stacked squeeze-excitation convolutional blocks trained from scratch. After training each subnetwork, the fusion of information from the audio and visual streams is performed at two different stages. The early fusion stage combines features resulting from the last convolutional block of the respective subnetworks at different time steps to feed a bidirectional recurrent structure. The late fusion stage combines the output of the early fusion stage with the independent predictions provided by the two subnetworks, resulting in the final prediction. We evaluate the method using the recently published TAU Audio-Visual Urban Scenes 2021, which contains synchronized audio and video recordings from 12 European cities in 10 different scene classes. The proposed model has been shown to provide an excellent trade-off between prediction performance (86.5%) and system complexity (15M parameters) in the evaluation results of the DCASE 2021 Challenge.

0
0
下载
预览
父主题
子主题
Top