信息流量估计:Twitter新闻研究 (Information flow estimation: a study of news on Twitter)

News media has long been an ecosystem of creation, reproduction, and critique, where news outlets report on current events and add commentary to ongoing stories. Understanding the dynamics of news information creation and dispersion is important to accurately ascribe credit to influential work and understand how societal narratives develop. These dynamics can be modelled through a combination of information-theoretic natural language processing and networks; and can be parameterised using large quantities of textual data. However, it is challenging to see "the wood for the trees", i.e., to detect small but important flows of information in a sea of noise. Here we develop new comparative techniques to estimate temporal information flow between pairs of text producers. Using both simulated and real text data we compare the reliability and sensitivity of methods for estimating textual information flow, showing that a metric that normalises by local neighbourhood structure provides a robust estimate of information flow in large networks. We apply this metric to a large corpus of news organisations on Twitter and demonstrate its usefulness in identifying influence within an information ecosystem, finding that average information contribution to the network is not correlated with the number of followers or the number of tweets. This suggests that small local organisations and right-wing organisations which have lower average follower counts still contribute significant information to the ecosystem. Further, the methods are applied to smaller full-text datasets of specific news events across news sites and Russian troll bots on Twitter. The information flow estimation reveals and quantifies features of how these events develop and the role of groups of bots in setting disinformation narratives.

翻译：长期以来,新闻媒体一直是一个创造、复制和批评的生态系统,新闻机构在其中报道当前事件并增加对当前故事的评论。了解新闻信息创建和分散的动态对于准确地将有影响力的工作归功于有影响力的工作和理解社会叙事的发展方式非常重要。这些动态可以通过信息理论性自然语言处理和网络的组合来模拟;并且可以使用大量文本数据进行参数化。然而,看到“树木的木头”是具有挑战性的,即在噪音的海洋中探测信息流的微小但重要的信息流。在这里,我们开发了新的比较技术来估计文本制作者之间的时间信息流。我们利用模拟和真实文本数据来比较估算文本信息流的可靠性和敏感性。这些动态可以通过信息流的模型来模拟和敏感度,显示当地邻居结构的正常度提供了大型网络的信息流的可靠估计。我们将这一矩阵应用于大量在推特上的新闻组织,并显示其在识别信息生态系统影响方面的有用性,发现对网络的平均信息贡献与跟踪者的数量或推文的数量没有关联性关系。这说明,利用模拟和真实的文本数据数据数据数据数据数据数据数据数据数据数据数据数据,表明小地方组织和右端组织对具体信息流和右端组织进行更低的统计。