Online news and social media have been the de facto mediums to disseminate information globally from the beginning of the last decade. However, bias in content and purpose of intentions are not regulated, and managing bias is the responsibility of content consumers. In this regard, understanding the stances and biases of news sources towards specific entities becomes important. To address this problem, we use pretrained language models, which have been shown to bring about good results with no task-specific training or few-shot training. In this work, we approach the problem of characterizing Named Entities and Tweets as an open-ended text classification and open-ended fact probing problem.We evaluate the zero-shot language model capabilities of Generative Pretrained Transformer 2 (GPT-2) to characterize Entities and Tweets subjectively with human psychology-inspired and logical conditional prefixes and contexts. First, we fine-tune the GPT-2 model on a sufficiently large news corpus and evaluate subjective characterization of popular entities in the corpus by priming with prefixes. Second, we fine-tune GPT-2 with a Tweets corpus from a few popular hashtags and evaluate characterizing tweets by priming the language model with prefixes, questions, and contextual synopsis prompts. Entity characterization results were positive across measures and human evaluation.
翻译:从上个十年开始,在线新闻和社交媒体事实上一直是在全球传播信息的媒介,然而,在内容和目的上的偏见没有受到管制,管理偏见是内容消费者的责任。在这方面,了解新闻来源对特定实体的立场和偏见变得非常重要。为了解决这一问题,我们使用预先培训的语言模式,这些模式显示在没有具体任务培训或少见培训的情况下取得了良好成果。在这项工作中,我们把被命名的实体和Tweets定性为开放文本分类和开放事实问题。我们评估了Generalment Pretraced 变异器2(GPT-2)的零弹射语言模型能力,以便以主观的方式将实体和Tweets对特定实体的姿态和偏见。首先,我们通过一个足够大的新闻资料库对GPT-2模型进行微调,并通过前缀来评价公众实体的主观定性。第二,我们从几个流行的标签和快速的图表中,用正面的图表评估了各种图表,然后通过正面的图表和正面的图表来评价。