"猪鸡"是鲸还是鸟? (Is "moby dick" a Whale or a Bird? Named Entities and Terminology in Speech Translation) - 专知论文

会员服务 ·

0

entity · 语音翻译 · Performer · state-of-the-art · 有向 ·

2021 年 9 月 15 日

Is "moby dick" a Whale or a Bird? Named Entities and Terminology in Speech Translation

翻译："猪鸡"是鲸还是鸟?

Marco Gaido,Susana Rodríguez,Matteo Negri,Luisa Bentivogli,Marco Turchi

from arxiv, Accepted at EMNLP2021

Automatic translation systems are known to struggle with rare words. Among these, named entities (NEs) and domain-specific terms are crucial, since errors in their translation can lead to severe meaning distortions. Despite their importance, previous speech translation (ST) studies have neglected them, also due to the dearth of publicly available resources tailored to their specific evaluation. To fill this gap, we i) present the first systematic analysis of the behavior of state-of-the-art ST systems in translating NEs and terminology, and ii) release NEuRoparl-ST, a novel benchmark built from European Parliament speeches annotated with NEs and terminology. Our experiments on the three language directions covered by our benchmark (en->es/fr/it) show that ST systems correctly translate 75-80% of terms and 65-70% of NEs, with very low performance (37-40%) on person names.

翻译：已知自动翻译系统会用稀有的词来挣扎。其中,命名实体(NES)和具体域名术语至关重要,因为翻译中的错误可能导致严重的含义扭曲。尽管它们很重要,但先前的语音翻译(ST)研究忽视了它们,这也是因为缺乏适合其具体评价的公开资源。为了填补这一空白,我们i)首次对最新科技系统在翻译NES和术语方面的行为进行了系统化分析,二)发布NEuRoparl-ST,这是欧洲议会演讲中设定的带有NES和术语的新基准。我们对基准(en->es/fr/it)所涵盖的三种语言方向的实验表明,ST系统正确翻译了75-80%的术语和65-70%的NES,在个人姓名上表现非常低(37-40% )。

0

相关内容

entity

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【机器学习术语宝典】机器学习中英文术语表

【机器学习术语宝典】机器学习中英文术语表

专知会员服务

61+阅读 · 2020年7月12日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

所有跨语言嵌入式都应该讲英语吗? | Should All Cross-Lingual Embeddings Speak English?

所有跨语言嵌入式都应该讲英语吗? | Should All Cross-Lingual Embeddings Speak English?

专知会员服务

7+阅读 · 2020年4月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

计算机 | IUI 2020等国际会议信息4条

计算机 | IUI 2020等国际会议信息4条

Call4Papers

6+阅读 · 2019年6月17日

计算机 | 中低难度国际会议信息6条

计算机 | 中低难度国际会议信息6条

Call4Papers

7+阅读 · 2019年5月16日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

五个精彩实用的自然语言处理资源

五个精彩实用的自然语言处理资源

机器学习研究会

6+阅读 · 2018年2月23日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

Reducing the impact of out of vocabulary words in the translation of natural language questions into SPARQL queries

Reducing the impact of out of vocabulary words in the translation of natural language questions into SPARQL queries

Arxiv

0+阅读 · 2021年11月4日

Domain Generalization in Vision: A Survey

Arxiv

16+阅读 · 2021年7月18日

Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition

Arxiv

3+阅读 · 2021年5月14日

Zero-Resource Cross-Lingual Named Entity Recognition

Arxiv

5+阅读 · 2019年11月22日

Towards Open-Domain Named Entity Recognition via Neural Correction Models

Arxiv

5+阅读 · 2019年9月13日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

73+阅读 · 2018年12月22日

Joint entity recognition and relation extraction as a multi-head selection problem

Arxiv

3+阅读 · 2018年12月17日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model

Arxiv

6+阅读 · 2018年3月22日

PEYMA: A Tagged Corpus for Persian Named Entities

Arxiv

5+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

【机器学习术语宝典】机器学习中英文术语表

【机器学习术语宝典】机器学习中英文术语表

专知会员服务

61+阅读 · 2020年7月12日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

所有跨语言嵌入式都应该讲英语吗? | Should All Cross-Lingual Embeddings Speak English?

所有跨语言嵌入式都应该讲英语吗? | Should All Cross-Lingual Embeddings Speak English?

专知会员服务

7+阅读 · 2020年4月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

检索增强生成（RAG）技术，261页slides

美联参会指南-联合规划与执行概述及政策框架 | 32页

从DeepSeek-R1学到的三个核心经验

大规模视觉模型中的提示式适配：综述

相关资讯

计算机 | IUI 2020等国际会议信息4条

计算机 | IUI 2020等国际会议信息4条

Call4Papers

6+阅读 · 2019年6月17日

计算机 | 中低难度国际会议信息6条

计算机 | 中低难度国际会议信息6条

Call4Papers

7+阅读 · 2019年5月16日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

五个精彩实用的自然语言处理资源

五个精彩实用的自然语言处理资源

机器学习研究会

6+阅读 · 2018年2月23日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

相关论文

Reducing the impact of out of vocabulary words in the translation of natural language questions into SPARQL queries

Reducing the impact of out of vocabulary words in the translation of natural language questions into SPARQL queries

Arxiv

0+阅读 · 2021年11月4日

Domain Generalization in Vision: A Survey

Arxiv

16+阅读 · 2021年7月18日

Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition

Arxiv

3+阅读 · 2021年5月14日

Zero-Resource Cross-Lingual Named Entity Recognition

Arxiv

5+阅读 · 2019年11月22日

Towards Open-Domain Named Entity Recognition via Neural Correction Models

Arxiv

5+阅读 · 2019年9月13日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

73+阅读 · 2018年12月22日

Joint entity recognition and relation extraction as a multi-head selection problem

Arxiv

3+阅读 · 2018年12月17日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model

Arxiv

6+阅读 · 2018年3月22日

PEYMA: A Tagged Corpus for Persian Named Entities

Arxiv

5+阅读 · 2018年1月30日

微信扫码咨询专知VIP会员