Lipformer: 学习以视觉- Landmark 变换器为基础的Lipformer Lipformer, 向Lipread 未知的发言者学习 (LipFormer: Learning to Lipread Unseen Speakers based on Visual-Landmark Transformers) - 专知论文

会员服务 ·

0

流 · MoDELS · 变换 · Learning · state-of-the-art ·

2023 年 2 月 4 日

LipFormer: Learning to Lipread Unseen Speakers based on Visual-Landmark Transformers

翻译：Lipformer: 学习以视觉- Landmark 变换器为基础的Lipformer Lipformer, 向Lipread 未知的发言者学习

Feng Xue,Yu Li,Deyin Liu,Yincen Xie,Lin Wu,Richang Hong

from arxiv, Under review

Lipreading refers to understanding and further translating the speech of a speaker in the video into natural language. State-of-the-art lipreading methods excel in interpreting overlap speakers, i.e., speakers appear in both training and inference sets. However, generalizing these methods to unseen speakers incurs catastrophic performance degradation due to the limited number of speakers in training bank and the evident visual variations caused by the shape/color of lips for different speakers. Therefore, merely depending on the visible changes of lips tends to cause model overfitting. To address this problem, we propose to use multi-modal features across visual and landmarks, which can describe the lip motion irrespective to the speaker identities. Then, we develop a sentence-level lipreading framework based on visual-landmark transformers, namely LipFormer. Specifically, LipFormer consists of a lip motion stream, a facial landmark stream, and a cross-modal fusion. The embeddings from the two streams are produced by self-attention, which are fed to the cross-attention module to achieve the alignment between visuals and landmarks. Finally, the resulting fused features can be decoded to output texts by a cascade seq2seq model. Experiments demonstrate that our method can effectively enhance the model generalization to unseen speakers.

翻译：唇印是指理解和进一步将视频中发言者的演讲译成自然语言。最先进的唇读方法在解释重叠的发言者方面非常出色,即,在培训和推论组合中都出现发言者,然而,将这些方法推广到隐蔽的发言者会造成灾难性的性能退化,因为培训银行的发言者人数有限,而且不同发言者的嘴唇形状/颜色造成明显的视觉变异。因此,仅仅取决于嘴唇的可见变化往往会导致模式的过度配置。为了解决这一问题,我们提议在视觉和标志中使用多种模式的功能,可以描述嘴唇运动,而不论发言者的身份如何。然后,我们根据视觉和地标变形器(即LipFormer)来制定一个句级唇读框架。具体地说,LipFormer由唇动流、面状标志流和交叉模式融合组成。两种流的嵌入通过自我感应产生,这些嵌入模块用于交叉使用模块,以便实现视觉和标志之间的对准。最后,产生的导导式模化式的导体功能可以有效地展示到可升级的版本。

0

相关内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

电子齿轮箱控制误差与齿轮加工误差的映射规律及补偿研究

国家自然科学基金

0+阅读 · 2015年12月31日

PDE4DIP影响结直肠癌形成的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Navier-Stokes 方程组的若干存在性问题

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Txnip的DNA甲基化修饰在糖尿病肾病足细胞损伤中的作用及调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

量子点和稀土离子共敏化二氧化钛纳米管阵列太阳能电池的研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型抗生素Bagremycins生物合成基因簇的鉴定与解析

国家自然科学基金

0+阅读 · 2012年12月31日

《计算机研究与发展》学术期刊

国家自然科学基金

1+阅读 · 2011年12月31日

磷脂酶D在肠癌中的激活及促进肠癌增殖转移的机制

国家自然科学基金

0+阅读 · 2011年12月31日

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Arxiv

0+阅读 · 2023年3月29日

Adaptive Spot-Guided Transformer for Consistent Local Feature Matching

Arxiv

0+阅读 · 2023年3月29日

Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations

Arxiv

0+阅读 · 2023年3月29日

Deep Convolutional Pooling Transformer for Deepfake Detection

Arxiv

0+阅读 · 2023年3月29日

Explain, Adapt and Retrain: How to improve the accuracy of a PPM classifier through different explanation styles

Arxiv

0+阅读 · 2023年3月27日

SATBA: An Invisible Backdoor Attack Based On Spatial Attention

Arxiv

0+阅读 · 2023年3月26日

Neural Preset for Color Style Transfer

Arxiv

0+阅读 · 2023年3月24日

LINe: Out-of-Distribution Detection by Leveraging Important Neurons

Arxiv

0+阅读 · 2023年3月24日

BoPR: Body-aware Part Regressor for Human Shape and Pose Estimation

Arxiv

0+阅读 · 2023年3月24日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Arxiv

0+阅读 · 2023年3月29日

Adaptive Spot-Guided Transformer for Consistent Local Feature Matching

Arxiv

0+阅读 · 2023年3月29日

Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations

Arxiv

0+阅读 · 2023年3月29日

Deep Convolutional Pooling Transformer for Deepfake Detection

Arxiv

0+阅读 · 2023年3月29日

Explain, Adapt and Retrain: How to improve the accuracy of a PPM classifier through different explanation styles

Arxiv

0+阅读 · 2023年3月27日

SATBA: An Invisible Backdoor Attack Based On Spatial Attention

Arxiv

0+阅读 · 2023年3月26日

Neural Preset for Color Style Transfer

Arxiv

0+阅读 · 2023年3月24日

LINe: Out-of-Distribution Detection by Leveraging Important Neurons

Arxiv

0+阅读 · 2023年3月24日

BoPR: Body-aware Part Regressor for Human Shape and Pose Estimation

Arxiv

0+阅读 · 2023年3月24日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

相关基金

电子齿轮箱控制误差与齿轮加工误差的映射规律及补偿研究

国家自然科学基金

0+阅读 · 2015年12月31日

PDE4DIP影响结直肠癌形成的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Navier-Stokes 方程组的若干存在性问题

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Txnip的DNA甲基化修饰在糖尿病肾病足细胞损伤中的作用及调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

量子点和稀土离子共敏化二氧化钛纳米管阵列太阳能电池的研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型抗生素Bagremycins生物合成基因簇的鉴定与解析

国家自然科学基金

0+阅读 · 2012年12月31日

《计算机研究与发展》学术期刊

国家自然科学基金

1+阅读 · 2011年12月31日

磷脂酶D在肠癌中的激活及促进肠癌增殖转移的机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员