自动音频字幕生成的图注意力 (Graph Attention for Automated Audio Captioning) - 专知论文

会员服务 ·

0

音频信号 · 长期依赖 · 上下文 · 关联 · 解码 ·

2023 年 4 月 7 日

Graph Attention for Automated Audio Captioning

翻译：自动音频字幕生成的图注意力

Feiyang Xiao,Jian Guan,Qiaoxi Zhu,Wenwu Wang

State-of-the-art audio captioning methods typically use the encoder-decoder structure with pretrained audio neural networks (PANNs) as encoders for feature extraction. However, the convolution operation used in PANNs is limited in capturing the long-time dependencies within an audio signal, thereby leading to potential performance degradation in audio captioning. This letter presents a novel method using graph attention (GraphAC) for encoder-decoder based audio captioning. In the encoder, a graph attention module is introduced after the PANNs to learn contextual association (i.e. the dependency among the audio features over different time frames) through an adjacency graph, and a top-k mask is used to mitigate the interference from noisy nodes. The learnt contextual association leads to a more effective feature representation with feature node aggregation. As a result, the decoder can predict important semantic information about the acoustic scene and events based on the contextual associations learned from the audio signal. Experimental results show that GraphAC outperforms the state-of-the-art methods with PANNs as the encoders, thanks to the incorporation of the graph attention module into the encoder for capturing the long-time dependencies within the audio signal. The source code is available at https://github.com/LittleFlyingSheep/GraphAC.

翻译：摘要：目前最先进的音频字幕生成方法通常使用编码器-解码器结构，其中使用预训练的音频神经网络（PANN）作为编码器进行特征提取。然而，PANN中使用的卷积操作在捕捉音频信号中的长期依赖性方面存在局限性，从而可能导致音频字幕的性能下降。本文提出了一种基于图注意力（GraphAC）的新方法来进行编码器-解码器音频字幕生成。在编码器中，引入了一个图注意力模块来学习通过邻接图的上下文关联（即不同时间帧中音频特征之间的依赖关系），并使用top-k掩模来减轻噪声节点的干扰。学习到的上下文关联通过特征节点聚合产生更有效的特征表示。因此，解码器可以基于从音频信号中学习到的上下文关联预测关于声学场景和事件的重要语义信息。实验结果表明，由于在编码器中引入了图注意力模块来捕捉音频信号中的长期依赖关系，因此GraphAC在超过使用PANN作为编码器的先进方法上表现出色。源代码可在https://github.com/LittleFlyingSheep/GraphAC上获得。

0

相关内容

音频信号

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

近期必读的五篇ICLR 2021【图神经网络（GNN）】相关论文和代码

近期必读的五篇ICLR 2021【图神经网络（GNN）】相关论文和代码

专知会员服务

69+阅读 · 2021年2月25日

近期必读的五篇 IJCAI 2020【图神经网络 (GNN)+NLP 】相关论文

近期必读的五篇 IJCAI 2020【图神经网络 (GNN)+NLP 】相关论文

专知会员服务

76+阅读 · 2020年8月18日

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

专知会员服务

78+阅读 · 2020年5月31日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

六篇 EMNLP 2019【图神经网络(GNN)+NLP】相关论文

六篇 EMNLP 2019【图神经网络(GNN)+NLP】相关论文

专知会员服务

72+阅读 · 2019年11月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

一文带你浏览Graph Transformers

一文带你浏览Graph Transformers

PaperWeekly

1+阅读 · 2022年7月8日

AAAI2020 图相关论文集

AAAI2020 图相关论文集

图与推荐

11+阅读 · 2020年7月15日

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

【论文推荐】最新六篇生成式对抗网络（GAN）相关论文—半监督学习、对偶、交互生成对抗网络、激活、纳什均衡、tempoGAN

【论文推荐】最新六篇生成式对抗网络（GAN）相关论文—半监督学习、对偶、交互生成对抗网络、激活、纳什均衡、tempoGAN

专知

23+阅读 · 2018年2月23日

用于音频子系统的自适应动态电源放大器新结构及其噪声抑制机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

H2S抑制内质网应激在COPD气道上皮细胞凋亡中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

小气道和肺泡成纤维细胞在COPD基质转换中的区域性差异及其机制

国家自然科学基金

0+阅读 · 2014年12月31日

糖尿病视网膜病变患者MR脑结构与功能网络分析

国家自然科学基金

0+阅读 · 2012年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

全光场相机的成像理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Skutterudite/AgSbTe2系纳米复合热电材料研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

胶体晶体微结构光纤的制备和光学特性研究

国家自然科学基金

0+阅读 · 2011年12月31日

混合策略的机器翻译方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot Attention for Vision-and-Language Navigation

Arxiv

0+阅读 · 2023年5月26日

Masked and Permuted Implicit Context Learning for Scene Text Recognition

Arxiv

0+阅读 · 2023年5月25日

Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis

Arxiv

0+阅读 · 2023年5月25日

CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation

Arxiv

0+阅读 · 2023年5月25日

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

Arxiv

0+阅读 · 2023年5月24日

Knowledge Embedding Based Graph Convolutional Network

Knowledge Embedding Based Graph Convolutional Network

Arxiv

24+阅读 · 2021年4月23日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

Arxiv

40+阅读 · 2019年6月4日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

近期必读的五篇ICLR 2021【图神经网络（GNN）】相关论文和代码

近期必读的五篇ICLR 2021【图神经网络（GNN）】相关论文和代码

专知会员服务

69+阅读 · 2021年2月25日

近期必读的五篇 IJCAI 2020【图神经网络 (GNN)+NLP 】相关论文

近期必读的五篇 IJCAI 2020【图神经网络 (GNN)+NLP 】相关论文

专知会员服务

76+阅读 · 2020年8月18日

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

GRAPH-BERT ：学习图表示只需要注意力，GRAPH-BERT : Only Attention is Needed for Learning Graph Representations

专知会员服务

78+阅读 · 2020年5月31日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

六篇 EMNLP 2019【图神经网络(GNN)+NLP】相关论文

六篇 EMNLP 2019【图神经网络(GNN)+NLP】相关论文

专知会员服务

72+阅读 · 2019年11月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

一文带你浏览Graph Transformers

一文带你浏览Graph Transformers

PaperWeekly

1+阅读 · 2022年7月8日

AAAI2020 图相关论文集

AAAI2020 图相关论文集

图与推荐

11+阅读 · 2020年7月15日

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

【论文推荐】最新六篇生成式对抗网络（GAN）相关论文—半监督学习、对偶、交互生成对抗网络、激活、纳什均衡、tempoGAN

【论文推荐】最新六篇生成式对抗网络（GAN）相关论文—半监督学习、对偶、交互生成对抗网络、激活、纳什均衡、tempoGAN

专知

23+阅读 · 2018年2月23日

相关论文

GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot Attention for Vision-and-Language Navigation

Arxiv

0+阅读 · 2023年5月26日

Masked and Permuted Implicit Context Learning for Scene Text Recognition

Arxiv

0+阅读 · 2023年5月25日

Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis

Arxiv

0+阅读 · 2023年5月25日

CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation

Arxiv

0+阅读 · 2023年5月25日

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

Arxiv

0+阅读 · 2023年5月24日

Knowledge Embedding Based Graph Convolutional Network

Knowledge Embedding Based Graph Convolutional Network

Arxiv

24+阅读 · 2021年4月23日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

Arxiv

40+阅读 · 2019年6月4日

相关基金

用于音频子系统的自适应动态电源放大器新结构及其噪声抑制机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

H2S抑制内质网应激在COPD气道上皮细胞凋亡中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

小气道和肺泡成纤维细胞在COPD基质转换中的区域性差异及其机制

国家自然科学基金

0+阅读 · 2014年12月31日

糖尿病视网膜病变患者MR脑结构与功能网络分析

国家自然科学基金

0+阅读 · 2012年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

全光场相机的成像理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Skutterudite/AgSbTe2系纳米复合热电材料研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

胶体晶体微结构光纤的制备和光学特性研究

国家自然科学基金

0+阅读 · 2011年12月31日

混合策略的机器翻译方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员