Data-driven approaches have been applied to many problems in urban computing. However, in the research community, such approaches are commonly studied under data from limited sources, and are thus unable to characterize the complexity of urban data coming from multiple entities and the correlations among them. Consequently, an inclusive and multifaceted dataset is necessary to facilitate more extensive studies on urban computing. In this paper, we present CityNet, a multi-modal urban dataset containing data from 7 cities, each of which coming from 3 data sources. We first present the generation process of CityNet as well as its basic properties. In addition, to facilitate the use of CityNet, we carry out extensive machine learning experiments, including spatio-temporal predictions, transfer learning, and reinforcement learning. The experimental results not only provide benchmarks for a wide range of tasks and methods, but also uncover internal correlations among cities and tasks within CityNet that, with adequate leverage, can improve performances on various tasks. With the benchmarking results and the correlations uncovered, we believe that CityNet can contribute to the field of urban computing by supporting research on many advanced topics.
翻译:以数据驱动的方法应用于城市计算方面的许多问题,然而,在研究界,这类方法通常根据来自有限来源的数据进行研究,因此无法说明来自多个实体的城市数据的复杂性及其相互关系。因此,为了便利对城市计算进行更广泛的研究,有必要建立一个包容性和多方面的数据集。在本文件中,我们介绍了城市网,这是一个多模式的城市数据集,包含来自7个城市的数据,每个城市的数据来自3个数据来源。我们首先介绍了城市网的生成过程及其基本特性。此外,为了便利城市网的使用,我们进行了广泛的机器学习实验,包括时空预测、转移学习和强化学习。实验结果不仅为范围广泛的任务和方法提供了基准,而且还揭示了城市与城市网内部的相互关系,这些内在关系通过充分的杠杆作用,可以改善各种任务的业绩。通过基准结果和发现的相关性,我们认为城市网可以通过支持许多先进专题的研究,为城市计算领域作出贡献。