迈向自动化文件修订:明显错误更正、流利编辑及以后 (Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond)

Natural language processing technology has rapidly improved automated grammatical error correction tasks, and the community begins to explore document-level revision as one of the next challenges. To go beyond sentence-level automated grammatical error correction to NLP-based document-level revision assistant, there are two major obstacles: (1) there are few public corpora with document-level revisions being annotated by professional editors, and (2) it is not feasible to elicit all possible references and evaluate the quality of revision with such references because there are infinite possibilities of revision. This paper tackles these challenges. First, we introduce a new document-revision corpus, TETRA, where professional editors revised academic papers sampled from the ACL anthology which contain few trivial grammatical errors that enable us to focus more on document- and paragraph-level edits such as coherence and consistency. Second, we explore reference-less and interpretable methods for meta-evaluation that can detect quality improvements by document revision. We show the uniqueness of TETRA compared with existing document revision corpora and demonstrate that a fine-tuned pre-trained language model can discriminate the quality of documents after revision even when the difference is subtle. This promising result will encourage the community to further explore automated document revision models and metrics in future.

翻译：自然语言处理技术迅速改进了自动语法错误校正任务,社区开始探索作为下一个挑战之一的文件校正,作为下一个挑战之一的文件级修改。为了超越以判决一级自动语法错误校正为主的NLP文件级修订助理,存在两大障碍:(1) 很少有公共社团,专业编辑对文件级的修订加注;(2) 由于有无限的修订可能性,因此不可能收集所有可能的参考资料,并用这些参考资料来评估修订的质量。本文件应对这些挑战。首先,我们引入一个新的文件审校材料(TETRA),其中专业编辑从ACL炭文学中抽取的经校文修改的学术论文,其中几乎没有小的语法错误,使我们能够更多地关注文件级和段落级的编辑,如一致性和一致性。第二,我们探索无参考和可解释的元评价方法,通过文件修订可以检测质量的改进。我们展示TETRA与现有文件审校校校公司的独特性,并展示经过微调整的经培训的语文模式,在修订文件后,甚至会在审校订文件质量上进一步探索标准。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日