缩放后翻缩与用于手语光滑翻译的域域文本生成 (Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation) - 专知论文

会员服务 ·

0

缩放 · Prompt · 连结 · 语言模型化 · GPT-2 ·

2022 年 10 月 13 日

Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation

翻译：缩放后翻缩与用于手语光滑翻译的域域文本生成

Jinhui Ye,Wenxiang Jiao,Xing Wang,Zhaopeng Tu

Sign language gloss translation aims to translate the sign glosses into spoken language texts, which is challenging due to the scarcity of labeled gloss-text parallel data. Back translation (BT), which generates pseudo-parallel data by translating in-domain spoken language texts into sign glosses, has been applied to alleviate the data scarcity problem. However, the lack of large-scale high-quality domain spoken language text data limits the effect of BT. In this paper, to overcome the limitation, we propose a Prompt based domain text Generation (PGEN) approach to produce the large-scale in-domain spoken language text data. Specifically, PGEN randomly concatenates sentences from the original in-domain spoken language text data as prompts to induce a pre-trained language model (i.e., GPT-2) to generate spoken language texts in a similar style. Experimental results on three benchmarks of sign language gloss translation in varied languages demonstrate that BT with spoken language texts generated by PGEN significantly outperforms the compared methods. In addition, as the scale of spoken language texts generated by PGEN increases, the BT technique can achieve further improvements, demonstrating the effectiveness of our approach. We release the code and data for facilitating future research in this field.

翻译：由于缺少标签的粗略文本平行数据,将手语符号翻译为口语文本,这具有挑战性,因为缺少贴有标签的粗略文本平行数据,因此很难将手语符号翻译为口语文本。背面翻译(BT)通过将内部口语原文翻译为手语符号,生成假单数数据,用于缓解数据稀缺问题;然而,缺乏大规模高品质域域口语文本数据限制了BT的效果。本文中,为了克服限制,我们建议采用快速的域域域文本生成(PGEN)方法来生成大型口语文本数据。具体地说,PGEN随机将原主语文本数据中的句子拼拼贴,以促使预先培训的语言模式(即GPT-2)以类似的方式生成口语文本文本。手语文本翻译的三个基准的实验结果显示,PGEN生成的口语文本比比较方法要差得多。此外,PGEN生成的口语文本规模越大,BT技术方法可以进一步提升,促进我们将来的实地研究。

0

相关内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【浙江大学-AAAI2020】领域自适应的对抗损失，Adversarial-Learned Loss for Domain Adaptation

【浙江大学-AAAI2020】领域自适应的对抗损失，Adversarial-Learned Loss for Domain Adaptation

专知会员服务

62+阅读 · 2020年1月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

新型天然产物F49促进肝癌细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

趋化因子受体CCR7调控上皮间质转化（EMT）促进肝癌耐药的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

Neolaxiflorin B的全合成研究

国家自然科学基金

0+阅读 · 2014年12月31日

酰胺键骨架构建的手性催化剂及其催化的不对称串联环化反应研究

国家自然科学基金

0+阅读 · 2012年12月31日

几何结构形变空间的几何拓扑

国家自然科学基金

0+阅读 · 2012年12月31日

血管平滑肌细胞表型相关lncRNA的筛查及其在转录调控中与核内microRNA的互作效应

国家自然科学基金

0+阅读 · 2012年12月31日

中国地区不同云状云的时空分布特征和多源云分类产品的比对验证研究

国家自然科学基金

0+阅读 · 2012年12月31日

HMGB1介导自噬对鼻咽癌放射敏感性的影响及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

MEMS微构件尺度效应的多尺寸影响及其位错驱动机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation

Arxiv

0+阅读 · 2022年11月18日

GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation

GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation

Arxiv

0+阅读 · 2022年11月18日

Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation

Arxiv

0+阅读 · 2022年11月17日

Text to Image Generation: Leaving no Language Behind

Arxiv

0+阅读 · 2022年11月17日

Back Translation Survey for Improving Text Augmentation

Arxiv

0+阅读 · 2022年11月16日

Dynamic-Pix2Pix: Noise Injected cGAN for Modeling Input and Target Domain Joint Distributions with Limited Training Data

Arxiv

0+阅读 · 2022年11月15日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【浙江大学-AAAI2020】领域自适应的对抗损失，Adversarial-Learned Loss for Domain Adaptation

【浙江大学-AAAI2020】领域自适应的对抗损失，Adversarial-Learned Loss for Domain Adaptation

专知会员服务

62+阅读 · 2020年1月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

相关论文

Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation

Arxiv

0+阅读 · 2022年11月18日

GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation

GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation

Arxiv

0+阅读 · 2022年11月18日

Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation

Arxiv

0+阅读 · 2022年11月17日

Text to Image Generation: Leaving no Language Behind

Arxiv

0+阅读 · 2022年11月17日

Back Translation Survey for Improving Text Augmentation

Arxiv

0+阅读 · 2022年11月16日

Dynamic-Pix2Pix: Noise Injected cGAN for Modeling Input and Target Domain Joint Distributions with Limited Training Data

Arxiv

0+阅读 · 2022年11月15日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

相关基金

新型天然产物F49促进肝癌细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

趋化因子受体CCR7调控上皮间质转化（EMT）促进肝癌耐药的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

Neolaxiflorin B的全合成研究

国家自然科学基金

0+阅读 · 2014年12月31日

酰胺键骨架构建的手性催化剂及其催化的不对称串联环化反应研究

国家自然科学基金

0+阅读 · 2012年12月31日

几何结构形变空间的几何拓扑

国家自然科学基金

0+阅读 · 2012年12月31日

血管平滑肌细胞表型相关lncRNA的筛查及其在转录调控中与核内microRNA的互作效应

国家自然科学基金

0+阅读 · 2012年12月31日

中国地区不同云状云的时空分布特征和多源云分类产品的比对验证研究

国家自然科学基金

0+阅读 · 2012年12月31日

HMGB1介导自噬对鼻咽癌放射敏感性的影响及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

MEMS微构件尺度效应的多尺寸影响及其位错驱动机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员