Every natural text is written in some style. Style is formed by a complex combination of different stylistic factors, including formality markers, emotions, metaphors, etc. One cannot form a complete understanding of a text without considering these factors. The factors combine and co-vary in complex ways to form styles. Studying the nature of the co-varying combinations sheds light on stylistic language in general, sometimes called cross-style language understanding. This paper provides the benchmark corpus (xSLUE) that combines existing datasets and collects a new one for sentence-level cross-style language understanding and evaluation. The benchmark contains text in 15 different styles under the proposed four theoretical groupings: figurative, personal, affective, and interpersonal groups. For valid evaluation, we collect an additional diagnostic set by annotating all 15 styles on the same text. Using xSLUE, we propose three interesting cross-style applications in classification, correlation, and generation. First, our proposed cross-style classifier trained with multiple styles together helps improve overall classification performance against individually-trained style classifiers. Second, our study shows that some styles are highly dependent on each other in human-written text. Finally, we find that combinations of some contradictive styles likely generate stylistically less appropriate text. We believe our benchmark and case studies help explore interesting future directions for cross-style research. The preprocessed datasets and code are publicly available.
翻译:每种自然文本都以某种方式写成。 样式是由不同形式因素( 包括形式标记、 情感、 隐喻等)的复杂组合构成的。 人们无法在不考虑这些因素的情况下对文本形成完全的理解。 这些因素以复杂的方式组成样式。 各种因素以复杂的方式组合在一起, 以复杂的方式形成。 研究混合组合的性质可以揭示通俗语言的一般特征, 有时被称为跨式语言理解。 本文提供基准集( xSLUE), 将现有的数据集结合起来, 并收集新的跨式语言理解和评估。 基准集有15种不同的文本, 在拟议的四个理论组下: 符号、 个人、 情感 和 人际 组下。 对于有效的评估, 我们收集了另外一种诊断, 通过对同一文本的所有15种样式进行注解, 我们建议三个有趣的跨式应用分类、 相关和新一代。 首先, 我们提议的跨式分类和跨式的跨式分类方法共同帮助改进了 相对于个别经过训练的风格分类分析者的整体性业绩。 其次, 我们的研究显示, 我们的分类方法最终会根据我们的一些格式的版本 认为, 我们的版本会发现, 我们的版本会比较的文本会发现, 。