Twitter上的实体和标签政治偏见检测 (Detecting Political Biases of Named Entities and Hashtags on Twitter)

Ideological divisions in the United States have become increasingly prominent in daily communication. Accordingly, there has been much research on political polarization, including many recent efforts that take a computational perspective. By detecting political biases in a corpus of text, one can attempt to describe and discern the polarity of that text. Intuitively, the named entities (i.e., the nouns and the phrases that act as nouns) and hashtags in text often carry information about political views. For example, people who use the term "pro-choice" are likely to be liberal, whereas people who use the term "pro-life" are likely to be conservative. In this paper, we seek to reveal political polarities in social-media text data and to quantify these polarities by explicitly assigning a polarity score to entities and hashtags. Although this idea is straightforward, it is difficult to perform such inference in a trustworthy quantitative way. Key challenges include the small number of known labels, the continuous spectrum of political views, and the preservation of both a polarity score and a polarity-neutral semantic meaning in an embedding vector of words. To attempt to overcome these challenges, we propose the Polarity-aware Embedding Multi-task learning (PEM) model. This model consists of (1) a self-supervised context-preservation task, (2) an attention-based tweet-level polarity-inference task, and (3) an adversarial learning task that promotes independence between an embedding's polarity dimension and its semantic dimensions. Our experimental results demonstrate that our PEM model can successfully learn polarity-aware embeddings that perform well classification tasks. We examine a variety of applications and we thereby demonstrate the effectiveness of our PEM model. We also discuss important limitations of our work and encourage caution when applying the it to real-world scenarios.

翻译：美国意识形态分裂在日常交流中变得越来越突出。因此，已经有很多对政治极化的研究，包括许多最近采用计算视角进行的努力。通过检测语料库中的政治偏见，可以试图描述和区分该文本的极性。直观地说，文本中的命名实体（即充当名词的名词和短语）和标签通常携带有关政治观点的信息。例如，使用“支持选择权”的人可能是自由派，而使用“反对堕胎”的人可能是保守派。在本文中，我们试图揭示社交媒体文本数据中的政治极性，并通过明确为实体和标签分配极性得分来量化这些极性。虽然这个想法很直观，但是以一种可信的定量方式进行这样的推理是困难的。主要挑战包括已知标签的数量很少，政治观点的连续谱以及在单词的嵌入向量中保留极性得分和极性中性语义意义。为了试图克服这些挑战，我们提出了极性感知嵌入多任务学习（PEM）模型。该模型包括（1）一个自我监督的保留上下文任务，（2）一个基于注意力的推文级极性推断任务，以及（3）一个促进嵌入的极性维度和其语义维度之间独立性的对抗学习任务。我们的实验结果表明，我们的PEM模型可以成功地学习出执行分类任务的极性感知嵌入。我们研究了各种应用，并因此证明了我们的PEM模型的有效性。我们还讨论了我们工作的重要局限性，并鼓励在将其应用于实际场景时谨慎。