Euphemisms have not received much attention in natural language processing, despite being an important element of polite and figurative language. Euphemisms prove to be a difficult topic, not only because they are subject to language change, but also because humans may not agree on what is a euphemism and what is not. Nevertheless, the first step to tackling the issue is to collect and analyze examples of euphemisms. We present a corpus of potentially euphemistic terms (PETs) along with example texts from the GloWbE corpus. Additionally, we present a subcorpus of texts where these PETs are not being used euphemistically, which may be useful for future applications. We also discuss the results of multiple analyses run on the corpus. Firstly, we find that sentiment analysis on the euphemistic texts supports that PETs generally decrease negative and offensive sentiment. Secondly, we observe cases of disagreement in an annotation task, where humans are asked to label PETs as euphemistic or not in a subset of our corpus text examples. We attribute the disagreement to a variety of potential reasons, including if the PET was a commonly accepted term (CAT).
翻译:在自然语言处理过程中,委婉没有受到多少注意,尽管它是一个礼貌和比喻语言的重要内容。委婉被证明是一个困难的话题,不仅因为它们会改变语言,而且因为人类可能不同意什么是委婉和什么不是。然而,解决这一问题的第一步是收集和分析委婉主义的例子。我们提出了一套潜在的委婉词(PETs)和来自GloWbE Cample的示例文本。此外,我们提出了一个次级的文本,这些文本没有被委婉地使用,这可能有益于今后的应用。我们还讨论了对文体进行多重分析的结果。首先,我们发现,对委婉文本的情绪分析支持PETs普遍减少消极和冒犯情绪。第二,我们发现在一项注解任务中存在分歧的情况,要求人类将PETs贴上“委婉”或“非”我们基本文本的子集。我们把分歧归因于各种可能的原因,包括《禁止酷刑公约》这一共同接受的术语。