Location data collected from mobile devices represent mobility behaviors at individual and societal levels. These data have important applications ranging from transportation planning to epidemic modeling. However, issues must be overcome to best serve these use cases: The data often represent a limited sample of the population and use of the data jeopardizes privacy. To address these issues, we present and evaluate a system for generating synthetic mobility data using a deep recurrent neural network (RNN) which is trained on real location data. The system takes a population distribution as input and generates mobility traces for a corresponding synthetic population. Related generative approaches have not solved the challenges of capturing both the patterns and variability in individuals' mobility behaviors over longer time periods, while also balancing the generation of realistic data with privacy. Our system leverages RNNs' ability to generate complex and novel sequences while retaining patterns from training data. Also, the model introduces randomness used to calibrate the variation between the synthetic and real data at the individual level. This is to both capture variability in human mobility, and protect user privacy. Location based services (LBS) data from more than 22,700 mobile devices were used in an experimental evaluation across utility and privacy metrics. We show the generated mobility data retain the characteristics of the real data, while varying from the real data at the individual level, and where this amount of variation matches the variation within the real data.
翻译:从移动设备中收集的位置数据代表个人和社会层面的流动行为。这些数据具有重要的应用,从交通规划到流行性建模等,但必须克服问题,才能最好地为这些使用案例提供最佳服务:数据往往代表有限的人口抽样,数据的使用危及隐私;为解决这些问题,我们提出并评价一个利用深层的经常性神经网络(RNN)生成合成流动数据的系统,该系统经过实际定位数据培训;该系统将人口分布作为投入,为相应的合成人口群体生成流动痕迹;相关的基因化方法没有解决在较长的时期内捕捉个人流动行为模式和变异性的挑战,同时平衡实际数据的生成与隐私之间的平衡;我们系统利用RNNS生成复杂和新的序列的能力,同时保留培训数据中的模式。此外,模型还引入了用于校准合成数据与真实定位数据之间差异的随机性,同时保护用户的隐私。基于地点的服务数据来自22 700多个移动设备,这些数据在实用性和隐私度和隐私度之间的实验性评估中使用,同时平衡了实际数据变化程度,我们从实际数据的变化中获取的数据。