Prabhupadavani: 25种语文的密码混合语音翻译数据 (Prabhupadavani: A Code-mixed Speech Translation Data for 25 Languages) - 专知论文

会员服务 ·

0

语音翻译 · 数据集 · Attention · 知识 (knowledge) · Machine Translation ·

2022 年 9 月 4 日

Prabhupadavani: A Code-mixed Speech Translation Data for 25 Languages

翻译：Prabhupadavani: 25种语文的密码混合语音翻译数据

Jivnesh Sandhan,Ayush Daksh,Om Adideva Paranjay,Laxmidhar Behera,Pawan Goyal

from arxiv, The work is accepted at COLING22-SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Nowadays, the interest in code-mixing has become ubiquitous in Natural Language Processing (NLP); however, not much attention has been given to address this phenomenon for Speech Translation (ST) task. This can be solely attributed to the lack of code-mixed ST task labelled data. Thus, we introduce Prabhupadavani, which is a multilingual code-mixed ST dataset for 25 languages. It is multi-domain, covers ten language families, containing 94 hours of speech by 130+ speakers, manually aligned with corresponding text in the target language. The Prabhupadavani is about Vedic culture and heritage from Indic literature, where code-switching in the case of quotation from literature is important in the context of humanities teaching. To the best of our knowledge, Prabhupadvani is the first multi-lingual code-mixed ST dataset available in the ST literature. This data also can be used for a code-mixed machine translation task. All the dataset can be accessed at https://github.com/frozentoad9/CMST.

翻译：目前,在自然语言处理(NLP)中,对编码混合的兴趣已变得无处不在;然而,对于处理语音翻译(ST)任务,没有多少注意解决这一现象,这完全归因于缺乏编码混合的ST任务标记数据。因此,我们引入了Prabhupadavani,这是一个25种语言的多语种代码混合的ST数据集。这是一个多域,涵盖十个语言家庭,包含130+发言者94小时的演讲时间,与目标语言的相应文本进行人工校正。Prabhupadavani是关于Indi文献中的Vedic文化和遗产的,其中文学引文中的编码转换在人文教学中很重要。就我们的知识而言,Prabhupadvani是第一个在ST文献中使用的多语种代码混合的ST数据集。这些数据也可以用于代码混合的机器翻译任务。所有数据集都可以在 https://github.com/frozentoad9/CMST上查阅。

0

相关内容

语音翻译

通过计算机进行不同语言之间的直接语音翻译，辅助不同语言背景的人们进行沟通已经成为世界各国研究的重点。和一般的文本翻译不同，语音翻译需要把语音识别、机器翻译和语音合成三大技术进行集成，具有很大的挑战性。

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

新型基于吡咯[4,3,2-de]喹啉类STAT3抑制剂的设计、合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2014年12月31日

偕二氟取代Combretastatins衍生物的设计与合成

国家自然科学基金

0+阅读 · 2014年12月31日

基于能量传递的宽带太阳光谱调制近红外下转换CaNb2O6发光薄膜研究

国家自然科学基金

0+阅读 · 2014年12月31日

La蛋白在铁诱导丙肝病毒蛋白翻译中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

连续变量量子纠缠态存储

国家自然科学基金

0+阅读 · 2009年12月31日

Texture Extraction Methods Based Ensembling Framework for Improved Classification

Arxiv

0+阅读 · 2022年10月20日

Language Does More Than Describe: On The Lack Of Figurative Speech in Text-To-Image Models

Arxiv

0+阅读 · 2022年10月19日

Self-Supervised Learning for Recommender Systems: A Survey

Arxiv

12+阅读 · 2022年3月29日

Cold-start Sequential Recommendation via Meta Learner

Cold-start Sequential Recommendation via Meta Learner

Arxiv

15+阅读 · 2020年12月10日

Sequential Scenario-Specific Meta Learner for Online Recommendation

Sequential Scenario-Specific Meta Learner for Online Recommendation

Arxiv

16+阅读 · 2019年6月2日

VIP会员

文章信息

相关主题

知识 (knowledge)

Machine Translation

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

星链与未来战争

《黑蜂（Black Hummingbird）微型无人机》

《全球地缘政治环境中的反无人机系统互操作性》252页

《美国：为自动驾驶汽车铺平道路——未来出行已来》最新43页报告

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Texture Extraction Methods Based Ensembling Framework for Improved Classification

Arxiv

0+阅读 · 2022年10月20日

Language Does More Than Describe: On The Lack Of Figurative Speech in Text-To-Image Models

Arxiv

0+阅读 · 2022年10月19日

Self-Supervised Learning for Recommender Systems: A Survey

Arxiv

12+阅读 · 2022年3月29日

Cold-start Sequential Recommendation via Meta Learner

Cold-start Sequential Recommendation via Meta Learner

Arxiv

15+阅读 · 2020年12月10日

Sequential Scenario-Specific Meta Learner for Online Recommendation

Sequential Scenario-Specific Meta Learner for Online Recommendation

Arxiv

16+阅读 · 2019年6月2日

相关基金

新型基于吡咯[4,3,2-de]喹啉类STAT3抑制剂的设计、合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2014年12月31日

偕二氟取代Combretastatins衍生物的设计与合成

国家自然科学基金

0+阅读 · 2014年12月31日

基于能量传递的宽带太阳光谱调制近红外下转换CaNb2O6发光薄膜研究

国家自然科学基金

0+阅读 · 2014年12月31日

La蛋白在铁诱导丙肝病毒蛋白翻译中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

连续变量量子纠缠态存储

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员