Recently, there have been considerable efforts to use online data to investigate international migration. These efforts show that Web data are valuable for estimating migration rates and are relatively easy to obtain. However, existing studies have only investigated flows of people along migration corridors, i.e. between pairs of countries. In our work, we use data about "places lived" from millions of Google+ users in order to study migration "clusters", i.e. groups of countries in which individuals have lived. For the first time, we consider information about more than two countries people have lived in. We argue that these data are very valuable because this type of information is not available in traditional demographic sources which record country-to-country migration flows independent of each other. We show that migration clusters of country triads cannot be identified using information about bilateral flows alone. To demonstrate the additional insights that can be gained by using data about migration clusters, we first develop a model that tries to predict the prevalence of a given triad using only data about its constituent pairs. We then inspect the groups of three countries which are more or less prominent, compared to what we would expect based on bilateral flows alone. Next, we identify a set of features such as a shared language or colonial ties that explain which triple of country pairs are more or less likely to be clustered when looking at country triples. Then we select and contrast a few cases of clusters that provide some qualitative information about what our data set shows. The type of data that we use is potentially available for a number of social media services. We hope that this first study about migration clusters will stimulate the use of Web data for the development of new theories of international migration that could not be tested appropriately before.
翻译:最近,在利用在线数据调查国际移徙方面做出了相当大的努力。这些努力表明,网络数据对于估计移徙率来说是有价值的,而且相对容易获得。然而,现有研究只调查了移徙走廊沿线,即两个国家之间的人口流动情况。我们在工作中,我们使用来自数百万谷歌+用户的“居住地点”数据来研究移徙“集群”,即个人居住的国家集团。我们第一次考虑关于两个以上人口所居住国家的信息。我们争辩说,这些数据非常有价值,因为传统人口来源没有提供这类信息,因为这些来源记录了各国间移徙流动情况,而彼此独立。我们显示,仅使用双边流动信息并不能确定国家移徙集群的移徙群组。为了展示通过使用移徙集群数据可以获取的额外认识,我们首先开发一种模型,试图预测某个三合在一起的国家的分布。我们随后仅使用关于成组的数据,我们先是用新的或较不显眼的三组,而我们先是先检验的三组,而仅靠双边流动的三组别数据,然后我们又会找出一组国家之间可能使用的三组数据。我们后来会用更难的一组数据来解释。