HeBERT & HebEMO:希伯来BERT模型和极地分析和情感识别工具 (HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition)

The use of Bidirectional Encoder Representations from Transformers (BERT) models for different natural language processing (NLP) tasks, and for sentiment analysis in particular, has become very popular in recent years and not in vain. The use of social media is being constantly on the rise. Its impact on all areas of our lives is almost inconceivable. Researches show that social media nowadays serves as one of the main tools where people freely express their ideas, opinions, and emotions. During the current Covid-19 pandemic, the role of social media as a tool to resonate opinions and emotions, became even more prominent. This paper introduces HeBERT and HebEMO. HeBERT is a transformer-based model for modern Hebrew text. Hebrew is considered a Morphological Rich Language (MRL), with unique characteristics that pose a great challenge in developing appropriate Hebrew NLP models. Analyzing multiple specifications of the BERT architecture, we come up with a language model that outperforms all existing Hebrew alternatives on multiple language tasks. HebEMO is a tool that uses HeBERT to detect polarity and extract emotions from Hebrew user-generated content (UGC), which was trained on a unique Covid-19 related dataset that we collected and annotated for this study. Data collection and annotation followed an innovative iterative semi-supervised process that aimed to maximize predictability. HebEMO yielded a high performance of weighted average F1-score = 0.96 for polarity classification. Emotion detection reached an F1-score of 0.78-0.97, with the exception of \textit{surprise}, which the model failed to capture (F1 = 0.41). These results are better than the best-reported performance, even when compared to the English language.

翻译：从变换器(变换器)到不同自然语言处理(NLP)任务,特别是情绪分析,使用双向读数表示模型,近年来已经非常流行,而不是徒劳。社交媒体的使用正在不断上升。它对我们生活的各个领域的影响几乎是不可想象的。研究表明,社交媒体目前是人们自由表达自己的想法、意见和情感的主要工具之一。在目前的Covid-19大流行期间,社交媒体作为不同自然语言处理(NLP)任务,特别是用于情绪分析的例外作用变得更加突出。本文介绍了HeBERT和HebEMO。 HeBERT是现代希伯来文字的变异模型。希伯来语被视作一种基于道德的丰富语言(MRRL),在开发适当的希伯来NLP模型时具有巨大的挑战性能。分析BERT结构的多重规格,我们形成了一种语言模型,在多种语言任务中超越了所有现有的希伯来替代语言。HebEMO是一个工具,用来检测极性,并且从希伯来用户1-19的双层检测结果,这是我们所收集的一种独特的高级数据。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/