In this work, we focus on sentence splitting, a subfield of text simplification, motivated largely by an unproven idea that if you divide a sentence in pieces, it should become easier to understand. Our primary goal in this paper is to find out whether this is true. In particular, we ask, does it matter whether we break a sentence into two or three? We report on our findings based on Amazon Mechanical Turk. More specifically, we introduce a Bayesian modeling framework to further investigate to what degree a particular way of splitting the complex sentence affects readability, along with a number of other parameters adopted from diverse perspectives, including clinical linguistics, and cognitive linguistics. The Bayesian modeling experiment provides clear evidence that bisecting the sentence leads to enhanced readability to a degree greater than what we create by trisection.
翻译:在这项工作中,我们侧重于分句,这是一个文字简化的子领域,其动机主要是未经证实的想法,即如果将一个句子分成几个部分,就会更容易理解。我们本文件的首要目标是找出这是否属实。特别是,我们问,我们把一个句子分成两个还是三个是否重要?我们报告我们根据亚马逊土克机械学得出的研究结果。更具体地说,我们引入了贝叶斯模式框架,以进一步调查将复杂句子分开的特定方式在多大程度上影响可读性,以及从不同角度,包括临床语言学和认知语言学中采用的若干其他参数。贝叶斯模型实验提供了明确的证据,说明将这一句分成两部分导致可读性提高到比我们通过三部分创造的更大程度。