填补古代阿卡德文字的空白:一种蒙面语言建模方法 (Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach) - 专知论文

会员服务 ·

0

掩码语言模型化 · 语言模型化 · MoDELS · 掩码 · 可辨认的 ·

2021 年 10 月 24 日

Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach

翻译：填补古代阿卡德文字的空白:一种蒙面语言建模方法

Koren Lazar,Benny Saret,Asaf Yehudai,Wayne Horowitz,Nathan Wasserman,Gabriel Stanovsky

from arxiv, Accepted to EMNLP 2021 (Main Conference)

We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE - 100 CE). Due to the tablets' deterioration, scholars often rely on contextual cues to manually fill in missing parts in the text in a subjective and time-consuming process. We identify that this challenge can be formulated as a masked language modelling task, used mostly as a pretraining objective for contextualized language models. Following, we develop several architectures focusing on the Akkadian language, the lingua franca of the time. We find that despite data scarcity (1M tokens) we can achieve state of the art performance on missing tokens prediction (89% hit@5) using a greedy decoding scheme and pretraining on data from other languages and different time periods. Finally, we conduct human evaluations showing the applicability of our models in assisting experts to transcribe texts in extinct languages.

翻译：我们展示了一些模型,这些模型根据古代美索波达米亚文件的转写法完成缺失的文本,最初是用古代美索波塔米亚文件拼写成的,最初是用粘土板写成的(2500 BCE - 100 CE)。由于平板的退化,学者们往往依赖背景线索,在主观和耗时的进程中人工填写文本中的缺失部分。我们确定,这一挑战可以作为一种隐蔽语言建模任务来拟订,主要用作背景化语言模型的培训前目标。随后,我们开发了几个建筑,重点是阿卡迪安语,即当时的方言。我们发现,尽管缺少数据(1M符号),但我们可以利用贪婪的解码计划和对其他语言和不同时间段的数据进行预先培训,实现缺失符号预测的艺术表现(89% hit@5)。最后,我们进行了人类评估,展示了我们的模型在协助专家用灭绝语言对文本进行书写方面是否适用。

0

相关内容

掩码语言模型化

掩码语言模型化

【AAAI2022】基于变分信息瓶颈的图结构学习

【AAAI2022】基于变分信息瓶颈的图结构学习

专知会员服务

20+阅读 · 2021年12月18日

【南京大学】量子计算 (Spring 2021)课程

专知会员服务

59+阅读 · 2021年4月12日

【EMNLP2020最佳论文】无声语音的数字化发声

【EMNLP2020最佳论文】无声语音的数字化发声

专知会员服务

12+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【斯坦福 Chris Manning 新课】CS224n：自然语言处理与深度学习，附课程PPT下载

【斯坦福 Chris Manning 新课】CS224n：自然语言处理与深度学习，附课程PPT下载

专知会员服务

75+阅读 · 2020年1月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

暗通沟渠：Multi-lingual Attention

暗通沟渠：Multi-lingual Attention

我爱读PAMI

7+阅读 · 2018年2月24日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

An ASP-based Approach to Answering Natural Language Questions for Texts

Arxiv

0+阅读 · 2021年12月21日

An Inference Approach To Question Answering Over Knowledge Graphs

Arxiv

0+阅读 · 2021年12月21日

A Multimodal Approach for Automatic Mania Assessment in Bipolar Disorder

Arxiv

0+阅读 · 2021年12月17日

Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test

Arxiv

4+阅读 · 2019年9月8日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

Visual Question Answering as Reading Comprehension

Arxiv

3+阅读 · 2018年11月29日

QuAC : Question Answering in Context

QuAC : Question Answering in Context

Arxiv

4+阅读 · 2018年8月21日

Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents

Arxiv

3+阅读 · 2018年4月25日

VizWiz Grand Challenge: Answering Visual Questions from Blind People

Arxiv

3+阅读 · 2018年4月2日

Analysis of Wikipedia-based Corpora for Question Answering

Arxiv

7+阅读 · 2018年1月6日

VIP会员

文章信息

相关主题

掩码语言模型化

语言模型化

相关VIP内容

【AAAI2022】基于变分信息瓶颈的图结构学习

【AAAI2022】基于变分信息瓶颈的图结构学习

专知会员服务

20+阅读 · 2021年12月18日

【南京大学】量子计算 (Spring 2021)课程

专知会员服务

59+阅读 · 2021年4月12日

【EMNLP2020最佳论文】无声语音的数字化发声

【EMNLP2020最佳论文】无声语音的数字化发声

专知会员服务

12+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【斯坦福 Chris Manning 新课】CS224n：自然语言处理与深度学习，附课程PPT下载

【斯坦福 Chris Manning 新课】CS224n：自然语言处理与深度学习，附课程PPT下载

专知会员服务

75+阅读 · 2020年1月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

人机协同作战规划：来自美海军陆战队的大语言模型（LLM）使用教训

对北约军事总部战略规划制定与实施的研究 | 140页

美联参会指南-联合规划与执行概述及政策框架 | 32页

俄罗斯军事规划差异性凸显其思维的重要性 | 2025最新文献

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

暗通沟渠：Multi-lingual Attention

暗通沟渠：Multi-lingual Attention

我爱读PAMI

7+阅读 · 2018年2月24日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

相关论文

An ASP-based Approach to Answering Natural Language Questions for Texts

Arxiv

0+阅读 · 2021年12月21日

An Inference Approach To Question Answering Over Knowledge Graphs

Arxiv

0+阅读 · 2021年12月21日

A Multimodal Approach for Automatic Mania Assessment in Bipolar Disorder

Arxiv

0+阅读 · 2021年12月17日

Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test

Arxiv

4+阅读 · 2019年9月8日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

Visual Question Answering as Reading Comprehension

Arxiv

3+阅读 · 2018年11月29日

QuAC : Question Answering in Context

QuAC : Question Answering in Context

Arxiv

4+阅读 · 2018年8月21日

Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents

Arxiv

3+阅读 · 2018年4月25日

VizWiz Grand Challenge: Answering Visual Questions from Blind People

Arxiv

3+阅读 · 2018年4月2日

Analysis of Wikipedia-based Corpora for Question Answering

Arxiv

7+阅读 · 2018年1月6日

微信扫码咨询专知VIP会员