This paper considers the problem of estimating exposure to information in a social network. Given a piece of information (e.g., a URL of a news article on Facebook, a hashtag on Twitter), our aim is to find the fraction of people on the network who have been exposed to it. The exact value of exposure to a piece of information is determined by two features: the structure of the underlying social network and the set of people who shared the piece of information. Often, both features are not publicly available (i.e., access to the two features is limited only to the internal administrators of the platform) and difficult to be estimated from data. As a solution, we propose two methods to estimate the exposure to a piece of information in an unbiased manner: a vanilla method which is based on sampling the network uniformly and a method which non-uniformly samples the network motivated by the Friendship Paradox. We provide theoretical results which characterize the conditions (in terms of properties of the network and the piece of information) under which one method outperforms the other. Further, we outline extensions of the proposed methods to dynamic information cascades (where the exposure needs to be tracked in real-time). We demonstrate the practical feasibility of the proposed methods via experiments on multiple synthetic and real-world datasets.
翻译:本文考虑了在社交网络中估计信息暴露情况的问题。考虑到一个信息片段(例如脸书上的一条新闻文章的网址,推特上的标签),我们的目标是找到网络中接触信息的人的一小部分。接触信息的确切价值由两个特点决定:基础社会网络的结构和共享信息的人组成的结构。通常,这两个特征都无法公开(即,访问这两个特征仅限于平台的内部管理员),很难从数据中估计。作为一种解决办法,我们提出了两种方法来以不偏不倚的方式估计对信息暴露情况:一种香草方法,以统一取样网络为基础,一种方法,不统一地抽样由友谊Paradox推动的网络。我们提供理论结果,说明一种方法(网络特性和信息部分)优于另一种方法的条件。此外,我们概述了拟议中向动态信息链扩展的方法(在需要通过合成时间对多种实际数据进行跟踪的情况下),我们展示了拟议在现实世界中进行实际可行性分析的方法。