The use of Bidirectional Encoder Representations from Transformers (BERT) models for different natural language processing (NLP) tasks, and for sentiment analysis in particular, has become very popular in recent years and not in vain. The use of social media is being constantly on the rise. Its impact on all areas of our lives is almost inconceivable. Researches show that social media nowadays serves as one of the main tools where people freely express their ideas, opinions, and emotions. During the current Covid-19 pandemic, the role of social media as a tool to resonate opinions and emotions, became even more prominent. This paper introduces HeBERT and HebEMO. HeBERT is a transformer-based model for modern Hebrew text. Hebrew is considered a Morphological Rich Language (MRL), with unique characteristics that pose a great challenge in developing appropriate Hebrew NLP models. Analyzing multiple specifications of the BERT architecture, we come up with a language model that outperforms all existing Hebrew alternatives on multiple language tasks. HebEMO is a tool that uses HeBERT to detect polarity and extract emotions from Hebrew user-generated content (UGC), which was trained on a unique Covid-19 related dataset that we collected and annotated for this study. Data collection and annotation followed an innovative iterative semi-supervised process that aimed to maximize predictability. HebEMO yielded a high performance of weighted average F1-score = 0.96 for polarity classification. Emotion detection reached an F1-score of 0.78-0.97, with the exception of \textit{surprise}, which the model failed to capture (F1 = 0.41). These results are better than the best-reported performance, even when compared to the English language.
翻译:从变换器(变换器)到不同自然语言处理(NLP)任务,特别是情绪分析,使用双向读数表示模型,近年来已经非常流行,而不是徒劳。社交媒体的使用正在不断上升。它对我们生活的各个领域的影响几乎是不可想象的。研究表明,社交媒体目前是人们自由表达自己的想法、意见和情感的主要工具之一。在目前的Covid-19大流行期间,社交媒体作为不同自然语言处理(NLP)任务,特别是用于情绪分析的例外作用变得更加突出。本文介绍了HeBERT和HebEMO。 HeBERT是现代希伯来文字的变异模型。希伯来语被视作一种基于道德的丰富语言(MRRL),在开发适当的希伯来NLP模型时具有巨大的挑战性能。分析BERT结构的多重规格,我们形成了一种语言模型,在多种语言任务中超越了所有现有的希伯来替代语言。HebEMO是一个工具,用来检测极性,并且从希伯来用户1-19的双层检测结果,这是我们所收集的一种独特的高级数据。