检测和识别内容调控 (Self-Supervised Euphemism Detection and Identification for Content Moderation)

Fringe groups and organizations have a long history of using euphemisms--ordinary-sounding words with a secret meaning--to conceal what they are discussing. Nowadays, one common use of euphemisms is to evade content moderation policies enforced by social media platforms. Existing tools for enforcing policy automatically rely on keyword searches for words on a "ban list", but these are notoriously imprecise: even when limited to swearwords, they can still cause embarrassing false positives. When a commonly used ordinary word acquires a euphemistic meaning, adding it to a keyword-based ban list is hopeless: consider "pot" (storage container or marijuana?) or "heater" (household appliance or firearm?) The current generation of social media companies instead hire staff to check posts manually, but this is expensive, inhumane, and not much more effective. It is usually apparent to a human moderator that a word is being used euphemistically, but they may not know what the secret meaning is, and therefore whether the message violates policy. Also, when a euphemism is banned, the group that used it need only invent another one, leaving moderators one step behind. This paper will demonstrate unsupervised algorithms that, by analyzing words in their sentence-level context, can both detect words being used euphemistically, and identify the secret meaning of each word. Compared to the existing state of the art, which uses context-free word embeddings, our algorithm for detecting euphemisms achieves 30-400% higher detection accuracies of unlabeled euphemisms in a text corpus. Our algorithm for revealing euphemistic meanings of words is the first of its kind, as far as we are aware. In the arms race between content moderators and policy evaders, our algorithms may help shift the balance in the direction of the moderators.

翻译：灰色团体和组织使用委婉词 — — 普通词有悠久的历史,使用委婉词 — — 普通词具有委婉词的含义,将其添加到基于关键词的禁止名单中是没有希望的 : 考虑“ 锅” (存储容器或大麻?) 或“加热器 ” (家用电器或火器?) 。如今,一个常用的委婉词的使用工具是逃避社交媒体平台执行的内容调适政策。现有的执行政策工具自动依赖关键词搜索“禁止名单”中的字句,但这些都是臭名昭著的不准确词。即使仅限于咒语,它们仍然可以造成令人尴尬的假正词。当一个常用的普通词具有委婉词的含义, 将它添加到基于关键词的禁止词句子中, 并且当一个基于高端词的词句子(eupal), 这个使用社交媒体公司的代言语代号的组, 也可以在使用另一个版本的纸质中显示我们现有的文字。