使用依赖用户的贴纸阅读 (Speaker-adaptive Lip Reading with User-dependent Padding) - 专知论文

会员服务 ·

0

INFORMS · Performer · MoDELS · Learning · 可约的 ·

2022 年 8 月 9 日

Speaker-adaptive Lip Reading with User-dependent Padding

翻译：使用依赖用户的贴纸阅读

Minsu Kim,Hyunjun Kim,Yong Man Ro

from arxiv, Accepted at ECCV2022

Lip reading aims to predict speech based on lip movements alone. As it focuses on visual information to model the speech, its performance is inherently sensitive to personal lip appearances and movements. This makes the lip reading models show degraded performance when they are applied to unseen speakers due to the mismatch between training and testing conditions. Speaker adaptation technique aims to reduce this mismatch between train and test speakers, thus guiding a trained model to focus on modeling the speech content without being intervened by the speaker variations. In contrast to the efforts made in audio-based speech recognition for decades, the speaker adaptation methods have not well been studied in lip reading. In this paper, to remedy the performance degradation of lip reading model on unseen speakers, we propose a speaker-adaptive lip reading method, namely user-dependent padding. The user-dependent padding is a speaker-specific input that can participate in the visual feature extraction stage of a pre-trained lip reading model. Therefore, the lip appearances and movements information of different speakers can be considered during the visual feature encoding, adaptively for individual speakers. Moreover, the proposed method does not need 1) any additional layers, 2) to modify the learned weights of the pre-trained model, and 3) the speaker label of train data used during pre-train. It can directly adapt to unseen speakers by learning the user-dependent padding only, in a supervised or unsupervised manner. Finally, to alleviate the speaker information insufficiency in public lip reading databases, we label the speaker of a well-known audio-visual database, LRW, and design an unseen-speaker lip reading scenario named LRW-ID.

翻译：读唇术的目的是仅仅根据嘴唇运动来预测言论。当它侧重于视觉信息以模拟讲话时,其性能对个人嘴唇外表和动作具有内在的敏感性。这使得唇读模型表明,由于培训和测试条件不匹配,在对看不见的演讲者应用时,其性能会降低;演讲者适应技术旨在减少火车和测试者之间的这种不匹配,从而指导一个经过培训的模式,在不受到演讲者变换的干扰的情况下,侧重于对发言内容进行模拟的示范。与数十年来在以声音为基础的语音识别方面所作的努力相比,在唇读中并没有很好地研究过演讲者的适应性适应方法。在本文中,为了纠正在隐性演讲者上唇读模型的性能退化,我们建议采用一种以语言为主的唇读法读取的唇读方法,即以用户为主的唇读法读写法;在经过培训的嘴唇读模型的视觉提取阶段,只能用语言者读懂的纸质,在经过培训的模版数据库中,用语言前的纸质数据库里,通过学习的模修改。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

对偶Auslander转置及其诱导模类的同调性质研究

国家自然科学基金

0+阅读 · 2015年12月31日

Neuregulin-1/ErbB信号传导系统在缺血性心脏病心肌血管重构中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

一类Schrodinger-Maxwell 系统解的存在性与多解性研究

国家自然科学基金

0+阅读 · 2014年12月31日

光纤Kerr非线性信号损伤的监测与管理机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

miR-34c-5p在鼻咽癌转移中的作用和机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

单载波频域均衡水声通信中稀疏信道估计及多通道均衡技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

OFDM传输体制的MIMO雷达自通信系统研究

国家自然科学基金

1+阅读 · 2012年12月31日

非凸Hamilton系统的Aubry-Mather理论

国家自然科学基金

0+阅读 · 2012年12月31日

Skutterudite/AgSbTe2系纳米复合热电材料研究

国家自然科学基金

0+阅读 · 2012年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

Spectral clustering via adaptive layer aggregation for multi-layer networks

Arxiv

0+阅读 · 2022年10月6日

WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger

Arxiv

0+阅读 · 2022年10月6日

Differentially Private Speaker Anonymization

Arxiv

0+阅读 · 2022年10月6日

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

Arxiv

0+阅读 · 2022年10月4日

ThinkSum: Probabilistic reasoning over sets using large language models

Arxiv

0+阅读 · 2022年10月4日

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

Arxiv

0+阅读 · 2022年10月3日

Context-Tuning: Learning Contextualized Prompts for Natural Language Generation

Arxiv

0+阅读 · 2022年10月3日

Language-Family Adapters for Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2022年9月30日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】语义提示扩散变换器的像素级精确深度估计

俄乌冲突的地缘政治与军事教训（万字长文）

【博士论文】弥合多模态基础模型与世界模型之间的鸿沟

量子增强计算机视觉：超越经典算法

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Spectral clustering via adaptive layer aggregation for multi-layer networks

Arxiv

0+阅读 · 2022年10月6日

WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger

Arxiv

0+阅读 · 2022年10月6日

Differentially Private Speaker Anonymization

Arxiv

0+阅读 · 2022年10月6日

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

Arxiv

0+阅读 · 2022年10月4日

ThinkSum: Probabilistic reasoning over sets using large language models

Arxiv

0+阅读 · 2022年10月4日

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

Arxiv

0+阅读 · 2022年10月3日

Context-Tuning: Learning Contextualized Prompts for Natural Language Generation

Arxiv

0+阅读 · 2022年10月3日

Language-Family Adapters for Multilingual Neural Machine Translation

Arxiv

0+阅读 · 2022年9月30日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

相关基金

对偶Auslander转置及其诱导模类的同调性质研究

国家自然科学基金

0+阅读 · 2015年12月31日

Neuregulin-1/ErbB信号传导系统在缺血性心脏病心肌血管重构中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

一类Schrodinger-Maxwell 系统解的存在性与多解性研究

国家自然科学基金

0+阅读 · 2014年12月31日

光纤Kerr非线性信号损伤的监测与管理机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

miR-34c-5p在鼻咽癌转移中的作用和机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

单载波频域均衡水声通信中稀疏信道估计及多通道均衡技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

OFDM传输体制的MIMO雷达自通信系统研究

国家自然科学基金

1+阅读 · 2012年12月31日

非凸Hamilton系统的Aubry-Mather理论

国家自然科学基金

0+阅读 · 2012年12月31日

Skutterudite/AgSbTe2系纳米复合热电材料研究

国家自然科学基金

0+阅读 · 2012年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员