用多式联运变压器检测表达式 (Detecting expressions with multimodal transformers) - 专知论文

会员服务 ·

0

多峰值 · Performer · Better · 层 · 变换 ·

2020 年 11 月 30 日

Detecting expressions with multimodal transformers

翻译：用多式联运变压器检测表达式

Srinivas Parthasarathy,Shiva Sundaram

from arxiv, IEEE Spoken Language Technology Workshop 2021

Developing machine learning algorithms to understand person-to-person engagement can result in natural user experiences for communal devices such as Amazon Alexa. Among other cues such as voice activity and gaze, a person's audio-visual expression that includes tone of the voice and facial expression serves as an implicit signal of engagement between parties in a dialog. This study investigates deep-learning algorithms for audio-visual detection of user's expression. We first implement an audio-visual baseline model with recurrent layers that shows competitive results compared to current state of the art. Next, we propose the transformer architecture with encoder layers that better integrate audio-visual features for expressions tracking. Performance on the Aff-Wild2 database shows that the proposed methods perform better than baseline architecture with recurrent layers with absolute gains approximately 2% for arousal and valence descriptors. Further, multimodal architectures show significant improvements over models trained on single modalities with gains of up to 3.6%. Ablation studies show the significance of the visual modality for the expression detection on the Aff-Wild2 database.

翻译：开发机器学习算法,以了解人与人之间的接触,可以产生亚马逊亚历山德拉等公共装置的自然用户经验。在声音活动和凝视等其他提示中,一个人的视听表达方式,包括声音和面部表达的音调,可以作为各方在对话中接触的隐含信号。本研究调查了用于对用户表达方式进行视听检测的深学习算法。我们首先采用了一个具有经常性层次的视听基线模型,该层次显示与目前艺术状态相比具有竞争性的结果。接下来,我们建议采用具有编码层的变压器结构,将视听特征更好地结合到语音跟踪中。 Aff-Wild2 数据库的绩效显示,拟议方法的运行优于基线结构,经常层的运行绝对收益约为2%。此外,多式结构显示对单一模式培训模型的重大改进,其收益高达3.6%。对比研究显示Aff-Wild2数据库的语音检测视觉模式的重要性。

0

相关内容

多峰值

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

已删除

将门创投

7+阅读 · 2019年10月15日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

A survey of joint intent detection and slot-filling models in natural language understanding

Arxiv

0+阅读 · 2021年1月20日

Transformer-based Language Model Fine-tuning Methods for COVID-19 Fake News Detection

Transformer-based Language Model Fine-tuning Methods for COVID-19 Fake News Detection

Arxiv

0+阅读 · 2021年1月18日

Interpretable Multi-Head Self-Attention model for Sarcasm Detection in social media

Arxiv

1+阅读 · 2021年1月14日

Efficient Transformers: A Survey

Arxiv

23+阅读 · 2020年9月16日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Augmentation for small object detection

Augmentation for small object detection

Arxiv

11+阅读 · 2019年2月19日

A Compact Embedding for Facial Expression Similarity

A Compact Embedding for Facial Expression Similarity

Arxiv

3+阅读 · 2019年1月9日

DeepFakes: a New Threat to Face Recognition? Assessment and Detection

Arxiv

6+阅读 · 2018年12月20日

Dissecting Contextual Word Embeddings: Architecture and Representation

Dissecting Contextual Word Embeddings: Architecture and Representation

Arxiv

22+阅读 · 2018年8月27日

The challenge of simultaneous object detection and pose estimation: a comparative study

Arxiv

6+阅读 · 2018年1月24日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

已删除

将门创投

7+阅读 · 2019年10月15日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A survey of joint intent detection and slot-filling models in natural language understanding

Arxiv

0+阅读 · 2021年1月20日

Transformer-based Language Model Fine-tuning Methods for COVID-19 Fake News Detection

Transformer-based Language Model Fine-tuning Methods for COVID-19 Fake News Detection

Arxiv

0+阅读 · 2021年1月18日

Interpretable Multi-Head Self-Attention model for Sarcasm Detection in social media

Arxiv

1+阅读 · 2021年1月14日

Efficient Transformers: A Survey

Arxiv

23+阅读 · 2020年9月16日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Augmentation for small object detection

Augmentation for small object detection

Arxiv

11+阅读 · 2019年2月19日

A Compact Embedding for Facial Expression Similarity

A Compact Embedding for Facial Expression Similarity

Arxiv

3+阅读 · 2019年1月9日

DeepFakes: a New Threat to Face Recognition? Assessment and Detection

Arxiv

6+阅读 · 2018年12月20日

Dissecting Contextual Word Embeddings: Architecture and Representation

Dissecting Contextual Word Embeddings: Architecture and Representation

Arxiv

22+阅读 · 2018年8月27日

The challenge of simultaneous object detection and pose estimation: a comparative study

Arxiv

6+阅读 · 2018年1月24日

微信扫码咨询专知VIP会员