信息流量估计:Twitter新闻研究 (Information flow estimation: a study of news on Twitter)

News media has long been an ecosystem of creation, reproduction, and critique, where news outlets report on current events and add commentary to ongoing stories. Understanding the dynamics of news information creation and dispersion is important to accurately ascribe credit to influential work and understand how societal narratives develop. These dynamics can be modelled through a combination of information-theoretic natural language processing and networks; and can be parameterised using large quantities of textual data. However, it is challenging to see "the wood for the trees", i.e., to detect small but important flows of information in a sea of noise. Here we develop new comparative techniques to estimate temporal information flow between pairs of text producers. Using both simulated and real text data we compare the reliability and sensitivity of methods for estimating textual information flow, showing that a metric that normalises by local neighbourhood structure provides a robust estimate of information flow in large networks. We apply this metric to a large corpus of news organisations on Twitter and demonstrate its usefulness in identifying influence within an information ecosystem, finding that average information contribution to the network is not correlated with the number of followers or the number of tweets. This suggests that small local organisations and right-wing organisations which have lower average follower counts still contribute significant information to the ecosystem. Further, the methods are applied to smaller full-text datasets of specific news events across news sites and Russian troll accounts on Twitter. The information flow estimation reveals and quantifies features of how these events develop and the role of groups of trolls in setting disinformation narratives.

翻译：长期以来,新闻媒体一直是一个创造、复制和批评的生态系统,新闻机构在其中报道当前事件并增加对当前故事的评论。了解新闻信息创建和分散的动态对于准确地将有影响力的工作归功于有影响力的工作和理解社会叙事的发展方式非常重要。这些动态可以通过信息理论自然语言处理和网络相结合来模拟;并且可以使用大量文本数据进行参数化。然而,看到“树木的木头”是具有挑战性的,即在噪音的海洋中发现信息流的微小但重要的信息流。在这里,我们开发新的比较技术来估计一对文本制作者之间的时间信息流。我们利用模拟和真实文本数据来比较估算文本信息流的可靠性和敏感性。这些动态可以通过信息流的可靠性和敏感性来进行模拟。通过本地邻居结构的正常化为大型网络的信息流提供可靠的估计。我们将这一矩阵应用于大量在Twitter上的新闻组织,并展示其在识别信息生态系统内的影响方面的有用性,发现对网络的平均信息估算与追随者人数或推特上的推文数没有关联性。这说明,利用模拟和真实文本数据数据数据数据数据数据数据数据数据流的可靠性和准确性组织对具体信息流的准确性进行统计。