旅行时间预测统计效率 (Statistical Efficiency of Travel Time Prediction)

Modern mobile applications such as navigation services and ride-hailing platforms rely heavily on geospatial technologies, most critically predictions of the time required for a vehicle to traverse a particular route. Two major categories of prediction methods are segment-based approaches, which predict travel time at the level of road segments and then aggregate across the route, and route-based approaches, which use generic information about the trip such as origin and destination to predict travel time. Though various forms of these methods have been developed and used, there has been no rigorous theoretical comparison of the accuracy of these two approaches, and empirical studies have in many cases drawn opposite conclusions. We fill this gap by conducting the first theoretical analysis to compare these two approaches in terms of their predictive accuracy as a function of the sample size of the training data (the statistical efficiency). We introduce a modeling framework and formally define a family of segment-based estimators and route-based estimators that resemble many practical estimators proposed in the literature and used in practice. Under both finite sample and asymptotic settings, we give conditions under which segment-based approaches dominate their route-based counterparts. We find that although route-based approaches can avoid accumulative errors introduced by aggregating over individual road segments, such advantage is often offset by (significantly) smaller relevant sample sizes. For this reason we recommend the use of segment-based approaches if one has to make a choice between the two methods in practice. Our work highlights that the accuracy of travel time prediction is driven not just by the sophistication of the model, but also the spatial granularity at which those methods are applied.

翻译：导航服务和乘车平台等现代移动应用程序严重依赖地理空间技术,对车辆穿越特定路线所需时间的预测最为关键,预测方法分为两大类:以路段为基础的方法,预测路段一级的旅行时间,然后对路段进行汇总;以路段为基础的方法,使用诸如原产地和目的地等关于旅行的一般信息来预测旅行时间的通用信息;虽然已经开发和使用这些方法的各种形式,但没有对这两种方法的准确性进行严格的理论比较,而且在许多情况下,经验性研究得出相反的结论。我们通过进行第一次理论分析来填补这一空白,将这两种方法的预测准确性作为培训数据样本大小的函数(统计效率)加以比较。我们引入了一个模型框架,并正式界定了以路段为基础的估计和路段的类别,类似于文献和实践中提出的许多实际估计。在有限的抽样和微观环境中,我们给出了以路段为基础的方法在基于路线的对应方模式上对路段应用的准确性进行对照的条件。我们没有进行第一次理论分析,将这两种方法的准确性加以比较,以其预测准确性作为培训数据样本规模的函数的函数的函数的函数(统计效率),我们通常采用一种选择方法,从而避免采用一种基于路段段内的计算。