以蒙面语言模式探测语句 (Euphemistic Phrase Detection by Masked Language Model) - 专知论文

会员服务 ·

0

掩码语言模型化 · 语言模型化 · 掩码 · MoDELS · MINE ·

2021 年 9 月 10 日

Euphemistic Phrase Detection by Masked Language Model

翻译：以蒙面语言模式探测语句

Wanzheng Zhu,Suma Bhat

It is a well-known approach for fringe groups and organizations to use euphemisms -- ordinary-sounding and innocent-looking words with a secret meaning -- to conceal what they are discussing. For instance, drug dealers often use "pot" for marijuana and "avocado" for heroin. From a social media content moderation perspective, though recent advances in NLP have enabled the automatic detection of such single-word euphemisms, no existing work is capable of automatically detecting multi-word euphemisms, such as "blue dream" (marijuana) and "black tar" (heroin). Our paper tackles the problem of euphemistic phrase detection without human effort for the first time, as far as we are aware. We first perform phrase mining on a raw text corpus (e.g., social media posts) to extract quality phrases. Then, we utilize word embedding similarities to select a set of euphemistic phrase candidates. Finally, we rank those candidates by a masked language model -- SpanBERT. Compared to strong baselines, we report 20-50% higher detection accuracies using our algorithm for detecting euphemistic phrases.

翻译：对于边缘群体和组织来说,一种众所周知的方法是使用委婉主义 -- -- 普通的、能听懂的和无辜的、隐秘的言词 -- -- 来掩盖他们所讨论的内容。例如,毒贩经常用“罐子”来掩盖大麻,用“avocado”来掩盖海洛因。从社交媒体内容的温和观点来看,虽然NLP最近的进展使得这种单词委婉主义能够自动发现,但没有任何现有工作能够自动发现多词委婉主义,例如“蓝梦”(marijuana)和“黑焦油”(heroin)。据我们所知,我们的文件首次解决了未经人类努力而探测的委婉词的问题。我们首先用原始文字材料(例如社交媒体文章)来挖掘高质量的词句。然后,我们用语言嵌入相似性词来选择一组委婉用语的候选人。最后,我们用隐蔽语言模型来将这些候选人排位 -- SpanBERT。与强的基线相比,我们报告说,20-50% 更高的查谎话短语使用我们测算算法进行20-50%。

0

相关内容

掩码语言模型化

掩码语言模型化

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

【CVPR2021】基于相似性分布距离的无监督人脸图像质量评价

专知会员服务

32+阅读 · 2021年3月19日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【NeurlPS2019论文强烈推荐】vGraph:联合社区检测和节点表示学习的生成模型，vGraph: A Generative Model for Joint Community Detection and Node Representational Learning

【NeurlPS2019论文强烈推荐】vGraph:联合社区检测和节点表示学习的生成模型，vGraph: A Generative Model for Joint Community Detection and Node Representational Learning

专知会员服务

30+阅读 · 2019年12月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

已删除

将门创投

11+阅读 · 2019年7月4日

Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model

Arxiv

0+阅读 · 2021年11月2日

ASMDD: Arabic Speech Mispronunciation Detection Dataset

Arxiv

0+阅读 · 2021年11月1日

Multi-Task Learning with Sentiment, Emotion, and Target Detection to Recognize Hate Speech and Offensive Language

Arxiv

0+阅读 · 2021年10月29日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Object Detection in 20 Years: A Survey

Object Detection in 20 Years: A Survey

Arxiv

48+阅读 · 2019年5月13日

Deep Learning for Generic Object Detection: A Survey

Deep Learning for Generic Object Detection: A Survey

Arxiv

14+阅读 · 2018年9月6日

Neural Models for Key Phrase Detection and Question Generation

Arxiv

4+阅读 · 2018年5月30日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation

Arxiv

6+阅读 · 2018年3月30日

Natural Language Guided Visual Relationship Detection

Arxiv

3+阅读 · 2017年11月21日

VIP会员

文章信息

相关主题

掩码语言模型化

语言模型化

相关VIP内容

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

【CVPR2021】基于相似性分布距离的无监督人脸图像质量评价

专知会员服务

32+阅读 · 2021年3月19日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【NeurlPS2019论文强烈推荐】vGraph:联合社区检测和节点表示学习的生成模型，vGraph: A Generative Model for Joint Community Detection and Node Representational Learning

【NeurlPS2019论文强烈推荐】vGraph:联合社区检测和节点表示学习的生成模型，vGraph: A Generative Model for Joint Community Detection and Node Representational Learning

专知会员服务

30+阅读 · 2019年12月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

最新《扩散模型原理》新书，470页pdf

无人机作战：演进、创新与未来战场

AI 智能体简史

多模态空间推理在大模型时代：综述与基准测试

相关资讯

已删除

将门创投

11+阅读 · 2019年7月4日

相关论文

Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model

Arxiv

0+阅读 · 2021年11月2日

ASMDD: Arabic Speech Mispronunciation Detection Dataset

Arxiv

0+阅读 · 2021年11月1日

Multi-Task Learning with Sentiment, Emotion, and Target Detection to Recognize Hate Speech and Offensive Language

Arxiv

0+阅读 · 2021年10月29日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Object Detection in 20 Years: A Survey

Object Detection in 20 Years: A Survey

Arxiv

48+阅读 · 2019年5月13日

Deep Learning for Generic Object Detection: A Survey

Deep Learning for Generic Object Detection: A Survey

Arxiv

14+阅读 · 2018年9月6日

Neural Models for Key Phrase Detection and Question Generation

Arxiv

4+阅读 · 2018年5月30日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation

Arxiv

6+阅读 · 2018年3月30日

Natural Language Guided Visual Relationship Detection

Arxiv

3+阅读 · 2017年11月21日

微信扫码咨询专知VIP会员