自然语言处理过程中的语言不变属性 (Language Invariant Properties in Natural Language Processing)

Meaning is context-dependent, but many properties of language (should) remain the same even if we transform the context. For example, sentiment, entailment, or speaker properties should be the same in a translation and original of a text. We introduce language invariant properties: i.e., properties that should not change when we transform text, and how they can be used to quantitatively evaluate the robustness of transformation algorithms. We use translation and paraphrasing as transformation examples, but our findings apply more broadly to any transformation. Our results indicate that many NLP transformations change properties like author characteristics, i.e., make them sound more male. We believe that studying these properties will allow NLP to address both social factors and pragmatic aspects of language. We also release an application suite that can be used to evaluate the invariance of transformation applications.

翻译：意思取决于上下文, 但语言的许多属性( should) 仍然相同, 即使我们转换了上下文。例如, 情绪、含义或扬声器属性在文本的翻译和正本中应该是相同的。我们引入了语言变量属性 : 即当我们转换文本时不应改变的属性, 以及这些属性如何用于量化评估变换算法的稳健性。我们使用翻译和参数作为转换示例, 但我们的发现更广泛地适用于任何变换。我们的结果表明, 许多 NLP 转换会改变属性, 如作者特性, 即让它们听起来更像男性。我们相信, 研究这些属性可以让 NLP 处理社会因素和语言的务实方面。我们还发布一个应用套件, 可以用来评估变换应用程序的易变换性。