BERT 用于检测阿拉伯语 GPT2 自动生成的 Tweets 的 BERT 变换器模型 (BERT Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets)

During the last two decades, we have progressively turned to the Internet and social media to find news, entertain conversations and share opinion. Recently, OpenAI has developed a ma-chine learning system called GPT-2 for Generative Pre-trained Transformer-2, which can pro-duce deepfake texts. It can generate blocks of text based on brief writing prompts that look like they were written by humans, facilitating the spread false or auto-generated text. In line with this progress, and in order to counteract potential dangers, several methods have been pro-posed for detecting text written by these language models. In this paper, we propose a transfer learning based model that will be able to detect if an Arabic sentence is written by humans or automatically generated by bots. Our dataset is based on tweets from a previous work, which we have crawled and extended using the Twitter API. We used GPT2-Small-Arabic to generate fake Arabic Sentences. For evaluation, we compared different recurrent neural network (RNN) word embeddings based baseline models, namely: LSTM, BI-LSTM, GRU and BI-GRU, with a transformer-based model. Our new transfer-learning model has obtained an accuracy up to 98%. To the best of our knowledge, this work is the first study where ARABERT and GPT2 were combined to detect and classify the Arabic auto-generated texts.

翻译：在过去20年中,我们逐渐转向互联网和社交媒体,以寻找新闻、娱乐谈话和分享观点。最近,OpenAI开发了一个称为GPT-2的机械化学习系统,名为GPT-2,用于培养培训前先导变异器-2,该系统可以产生深假文本。它可以基于简短的写作提示产生一组文本,看起来像是人类写的,便于传播假文本或自动生成文本。根据这一进展,为了消除潜在危险,我们用几种方法来探测这些语言模型编写的文本。在本文中,我们提出了一个基于传输的学习模式,能够检测一个阿拉伯语句子是人类写的还是机器人自动生成的。我们的数据集基于以前工作的推文,我们用TwitterAPI来爬升和扩展了这些文本。我们用GPT2-Small-阿拉伯文来生成假的文字。为了评估,我们比较了不同的经常性线性网络(RNN)词嵌入基准模型,即:LSTM、BILSTM、GRU和BI-RU的转移模型,这是我们从一个自动转换到GRU的文本,这是我们从一个自动转换到一个ALU的自动的系统。我们的一个ALI-RI-I-RI-R的升级的升级,这是我们的一项新的变换到一个基于的自动的升级的系统。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

专知会员服务

64+阅读 · 2020年4月28日