基于理解的提问的文本简化 (Text Simplification for Comprehension-based Question-Answering)

from arxiv, Accepted at W-NUT Workshop to be held at EMNLP 2021 as a long paper. Also presented at DeMAL Workshop held at the Web Conference (WWW) 2021

Text simplification is the process of splitting and rephrasing a sentence to a sequence of sentences making it easier to read and understand while preserving the content and approximating the original meaning. Text simplification has been exploited in NLP applications like machine translation, summarization, semantic role labeling, and information extraction, opening a broad avenue for its exploitation in comprehension-based question-answering downstream tasks. In this work, we investigate the effect of text simplification in the task of question-answering using a comprehension context. We release Simple-SQuAD, a simplified version of the widely-used SQuAD dataset. Firstly, we outline each step in the dataset creation pipeline, including style transfer, thresholding of sentences showing correct transfer, and offset finding for each answer. Secondly, we verify the quality of the transferred sentences through various methodologies involving both automated and human evaluation. Thirdly, we benchmark the newly created corpus and perform an ablation study for examining the effect of the simplification process in the SQuAD-based question answering task. Our experiments show that simplification leads to up to 2.04% and 1.74% increase in Exact Match and F1, respectively. Finally, we conclude with an analysis of the transfer process, investigating the types of edits made by the model, and the effect of sentence length on the transfer model.

翻译：简化文本是将句子分成和改成一系列句子的过程,这样在保留内容的同时更容易阅读和理解,同时接近原意。在国家实验室应用中,简化文本已被利用,例如机器翻译、总结、语义角色标签和信息提取,为在理解基础上解答下游任务中利用简化文本开辟了广阔的渠道。在这项工作中,我们利用理解背景调查简化文本对解答问题任务的影响。我们发布了简单SQUAD,这是广泛使用的SQUAD数据集的简化版本。首先,我们概述了数据集创建管道中的每一个步骤,包括风格转换、显示正确转移的句子阈值和每个答案的查找。第二,我们通过涉及自动化和人文评估的各种方法核查移交判决的质量。第三,我们为新创建的文设置了基准,并开展了一项相关研究,以研究简化程序在SQuAD回答问题中的效果。我们的实验显示,简化导致将简化到2.04%和1.74%的SQAD数据集的简化版本。我们分别通过Exact Match 和 F1 格式的转换过程和F1 分别完成。