多语种TEDx语音识别和翻译公司 (The Multilingual TEDx Corpus for Speech Recognition and Translation) - 专知论文

会员服务 ·

0

TEDx · 语音识别 · Extensibility · Performer · 语音翻译 ·

2021 年 2 月 2 日

The Multilingual TEDx Corpus for Speech Recognition and Translation

翻译：多语种TEDx语音识别和翻译公司

Elizabeth Salesky,Matthew Wiesner,Jacob Bremerman,Roldano Cattoni,Matteo Negri,Marco Turchi,Douglas W. Oard,Matt Post

We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages. The corpus is a collection of audio recordings from TEDx talks in 8 source languages. We segment transcripts into sentences and align them to the source-language audio and target-language translations. The corpus is released along with open-sourced code enabling extension to new talks and languages as they become available. Our corpus creation methodology can be applied to more languages than previous work, and creates multi-way parallel evaluation sets. We provide baselines in multiple ASR and ST settings, including multilingual models to improve translation performance for low-resource language pairs.

翻译：我们展示了多语言TEDx文集,该文集是为支持许多非英语语言语言的语音识别和语音翻译研究而建立的,以8种源语言收集TEDx会谈的录音记录,我们将笔录分为句子,并将其与原始语言的音频和目标语言翻译相统一,该文集与开放源代码一起发布,允许在新语和语言可用时将其扩展为新的话语和语言。我们的创制方法可以适用于比以往更多的语言,并创建多路平行的评价组。我们在多种ASR和ST设置中提供了基线,包括提高低资源语言对口翻译功能的多语言模型。

0

相关内容

TEDx

在”传递优秀思想“这一价值的指引下，TED推出了一个叫TEDx的项目。所谓TEDx，就是指那些由本地TED粉丝自愿发起、自行组织的小型聚会，让本地的TED粉丝能够聚到一起，共享TED一刻。
TED不会参与TEDx活动的组织，但是会对活动组织者给予指导和意见。

TEDx的价值交流、分享、互动、学习，此乃TEDx的核心价值。

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【论文】多语言神经机器翻译综述（A Comprehensive Survey of Multilingual Neural Machine Translation）

【论文】多语言神经机器翻译综述（A Comprehensive Survey of Multilingual Neural Machine Translation）

专知会员服务

20+阅读 · 2020年1月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

English-Twi Parallel Corpus for Machine Translation

Arxiv

0+阅读 · 2021年3月29日

Construction of a Large-scale Japanese ASR Corpus on TV Recordings

Arxiv

0+阅读 · 2021年3月26日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Speech2Action: Cross-modal Supervision for Action Recognition

Speech2Action: Cross-modal Supervision for Action Recognition

Arxiv

7+阅读 · 2020年3月30日

Sockeye: A Toolkit for Neural Machine Translation

Arxiv

7+阅读 · 2018年6月1日

VIP会员

文章信息

相关主题

相关VIP内容

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【论文】多语言神经机器翻译综述（A Comprehensive Survey of Multilingual Neural Machine Translation）

【论文】多语言神经机器翻译综述（A Comprehensive Survey of Multilingual Neural Machine Translation）

专知会员服务

20+阅读 · 2020年1月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用人工智能改善军事警察行动：当下现状探索》最新95页报告

《用于适应性、任务就绪型军用仿生机器人的合成数据管道》

面向现代武装力量的高级AI驱动军事模拟与训练软件

《军事应用中的AI：建立信任》最新报告

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

相关论文

English-Twi Parallel Corpus for Machine Translation

Arxiv

0+阅读 · 2021年3月29日

Construction of a Large-scale Japanese ASR Corpus on TV Recordings

Arxiv

0+阅读 · 2021年3月26日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Speech2Action: Cross-modal Supervision for Action Recognition

Speech2Action: Cross-modal Supervision for Action Recognition

Arxiv

7+阅读 · 2020年3月30日

Sockeye: A Toolkit for Neural Machine Translation

Arxiv

7+阅读 · 2018年6月1日

微信扫码咨询专知VIP会员