arXivEdits:了解科学写作中的人类修订过程 (arXivEdits: Understanding the Human Revision Process in Scientific Writing) - 专知论文

会员服务 ·

0

Processing（编程语言） · Analysis · 可理解性 · 条件随机场 · 张成子空间 ·

2022 年 10 月 26 日

arXivEdits: Understanding the Human Revision Process in Scientific Writing

翻译：arXivEdits:了解科学写作中的人类修订过程

Chao Jiang,Wei Xu,Samuel Stevens

from arxiv, This paper has been accepted to EMNLP 2022

Scientific publications are the primary means to communicate research discoveries, where the writing quality is of crucial importance. However, prior work studying the human editing process in this domain mainly focused on the abstract or introduction sections, resulting in an incomplete picture. In this work, we provide a complete computational framework for studying text revision in scientific writing. We first introduce arXivEdits, a new annotated corpus of 751 full papers from arXiv with gold sentence alignment across their multiple versions of revision, as well as fine-grained span-level edits and their underlying intentions for 1,000 sentence pairs. It supports our data-driven analysis to unveil the common strategies practiced by researchers for revising their papers. To scale up the analysis, we also develop automatic methods to extract revision at document-, sentence-, and word-levels. A neural CRF sentence alignment model trained on our corpus achieves 93.8 F1, enabling the reliable matching of sentences between different versions. We formulate the edit extraction task as a span alignment problem, and our proposed method extracts more fine-grained and explainable edits, compared to the commonly used diff algorithm. An intention classifier trained on our dataset achieves 78.9 F1 on the fine-grained intent classification task. Our data and system are released at tiny.one/arxivedits.

翻译：科学出版物是交流研究发现的主要手段,其写法质量至关重要。然而,以前研究人类编辑过程的工作主要侧重于抽象部分或介绍部分,结果造成不完整的图片。在这项工作中,我们为研究科学著作文本修订提供了完整的计算框架。我们首先引入了ArXivEdit,这是一套新的附加说明的751份完整论文汇编,由ArXiv版本的黄金句子在多个版本的修改中加以配对,还有细微的跨层编辑,以及对1 000对判刑进行的基本意图。它支持我们的数据驱动分析,以公布研究人员为修订论文而采用的共同战略。为了扩大分析,我们还开发了在文件、句子和字级上进行修改的自动方法。我们所培训的神经版的校正校正校正模型达到了93.8F1,使不同版本的句子能够进行可靠的匹配。我们将提取任务设计成一个宽度调问题,我们拟议的方法摘录和解释的编辑比通常使用的硬度缩略度算法要更精准和可解释的校正的校正。我们所推出的数据分类的系统是78.1。

0

相关内容

Processing（编程语言）

Processing（编程语言）

Processing 是一门开源编程语言和与之配套的集成开发环境（IDE）的名称。Processing 在电子艺术和视觉设计社区被用来教授编程基础，并运用于大量的新媒体和互动艺术作品中。

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

YKL-40水平及CHI3L1基因多态性与高血压发病关系的研究

国家自然科学基金

0+阅读 · 2015年12月31日

ATP13A2基因亚型Ala746Thr和Thr12met突变与新疆维吾尔族早发型和家族型帕金森病临床的相关研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于定理证明的多核并行程序验证

国家自然科学基金

0+阅读 · 2012年12月31日

IL-17F在实验性新生血管性眼病中的作用和机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

雷帕霉素通过Foxo3调节DCs耐受性抑制移植后肿瘤生长的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

精子介导的HIV-1基因在胚胎细胞中表达调控机制的探讨—#30456;关miRNA的筛选、鉴定与功能分析

国家自然科学基金

0+阅读 · 2011年12月31日

miR-206调控子宫内膜癌ERα的体内治疗实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

糖皮质激素受体对慢性鼻-鼻窦炎激素治疗抵抗的影响及超抗原参与作用的信号途径研究

国家自然科学基金

0+阅读 · 2008年12月31日

Understanding Translationese in Cross-Lingual Summarization

Arxiv

0+阅读 · 2022年12月14日

A fine-grained comparison of pragmatic language understanding in humans and language models

Arxiv

0+阅读 · 2022年12月13日

AdvMIL: Adversarial Multiple Instance Learning for the Survival Analysis on Whole-Slide Images

Arxiv

0+阅读 · 2022年12月13日

Increasing the Cost of Model Extraction with Calibrated Proof of Work

Arxiv

0+阅读 · 2022年12月12日

A Survey and Comparison of Industrial and Academic Research on the Evolution of Software Product Lines

Arxiv

0+阅读 · 2022年12月12日

A Benchmark for Understanding and Generating Dialogue between Characters in Stories

Arxiv

0+阅读 · 2022年12月12日

From Knowledge Augmentation to Multi-tasking: Towards Human-like Dialogue Systems

Arxiv

3+阅读 · 2022年12月11日

Curriculum Learning: A Survey

Arxiv

24+阅读 · 2021年1月25日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

VIP会员

文章信息

相关主题

Processing（编程语言）

条件随机场

张成子空间

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

相关论文

Understanding Translationese in Cross-Lingual Summarization

Arxiv

0+阅读 · 2022年12月14日

A fine-grained comparison of pragmatic language understanding in humans and language models

Arxiv

0+阅读 · 2022年12月13日

AdvMIL: Adversarial Multiple Instance Learning for the Survival Analysis on Whole-Slide Images

Arxiv

0+阅读 · 2022年12月13日

Increasing the Cost of Model Extraction with Calibrated Proof of Work

Arxiv

0+阅读 · 2022年12月12日

A Survey and Comparison of Industrial and Academic Research on the Evolution of Software Product Lines

Arxiv

0+阅读 · 2022年12月12日

A Benchmark for Understanding and Generating Dialogue between Characters in Stories

Arxiv

0+阅读 · 2022年12月12日

From Knowledge Augmentation to Multi-tasking: Towards Human-like Dialogue Systems

Arxiv

3+阅读 · 2022年12月11日

Curriculum Learning: A Survey

Arxiv

24+阅读 · 2021年1月25日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

相关基金

YKL-40水平及CHI3L1基因多态性与高血压发病关系的研究

国家自然科学基金

0+阅读 · 2015年12月31日

ATP13A2基因亚型Ala746Thr和Thr12met突变与新疆维吾尔族早发型和家族型帕金森病临床的相关研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于定理证明的多核并行程序验证

国家自然科学基金

0+阅读 · 2012年12月31日

IL-17F在实验性新生血管性眼病中的作用和机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

雷帕霉素通过Foxo3调节DCs耐受性抑制移植后肿瘤生长的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

精子介导的HIV-1基因在胚胎细胞中表达调控机制的探讨—#30456;关miRNA的筛选、鉴定与功能分析

国家自然科学基金

0+阅读 · 2011年12月31日

miR-206调控子宫内膜癌ERα的体内治疗实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

糖皮质激素受体对慢性鼻-鼻窦炎激素治疗抵抗的影响及超抗原参与作用的信号途径研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员