As news and social media exhibit an increasing amount of manipulative polarized content, detecting such propaganda has received attention as a new task for content analysis. Prior work has focused on supervised learning with training data from the same domain. However, as propaganda can be subtle and keeps evolving, manual identification and proper labeling are very demanding. As a consequence, training data is a major bottleneck. In this paper, we tackle this bottleneck and present an approach to leverage cross-domain learning, based on labeled documents and sentences from news and tweets, as well as political speeches with a clear difference in their degrees of being propagandistic. We devise informative features and build various classifiers for propaganda labeling, using cross-domain learning. Our experiments demonstrate the usefulness of this approach, and identify difficulties and limitations in various configurations of sources and targets for the transfer step. We further analyze the influence of various features, and characterize salient indicators of propaganda.
翻译:由于新闻和社交媒体展示了越来越多的操纵性两极化内容,发现这种宣传作为一种内容分析的新任务受到注意;先前的工作重点是以同一领域的培训数据监督学习;然而,由于宣传可能微妙且不断演进,人工识别和适当标签要求很高;因此,培训数据是一个重大瓶颈;在本文件中,我们处理这一瓶颈,提出一种办法,利用跨领域学习的跨领域学习,其依据是新闻和推文的标签文件和句子,以及政治演讲,其传播程度明显不同;我们设计信息特征,建立各种宣传标签分类,利用跨领域学习;我们的实验表明这一方法的有用性,并查明转移步骤各种来源和目标的各种配置方面的困难和限制;我们进一步分析各种特征的影响,并突出宣传指标。