In recent years, passively collected GPS data have been popularly applied in various transportation studies, such as highway performance monitoring, travel behavior analysis, and travel demand estimation. Despite multiple advantages, one of the issues is data oscillations (aka outliers or data jumps), which are unneglectable since they may distort mobility patterns and lead to wrongly or biased conclusions. For transportation studies driven by GPS data, assuring the data quality by removing noises caused by data oscillations is undoubtedly important. Most GPS-based studies simply remove oscillations by checking the high speed. However, this method can mistakenly identify normal points as oscillations. Some other studies specifically discuss the removal of outliers in GPS data, but they all have limitations and do not fit passively collected GPS data. Many studies are well developed for addressing the ping-pong phenomenon in cellular data, or cellular tower data, but the oscillations in passively collected GPS data are very different for having much more various and complicated patterns and being more uncertain. Current methods are insufficient and inapplicable to passively collected GPS data. This paper aims to address the oscillated points in passively collected GPS data. A set of heuristics are proposed by identifying the abnormal movement patterns of oscillations. The proposed heuristics well fit the features of passively collected GPS data and are adaptable to studies of different scales, which are also computationally cost-effective in comparison to current methods.
翻译:近年来,以GPS定位数据为代表的被动式数据在各种交通运输研究中得到了广泛的应用,如公路性能监测、出行行为分析和出行需求估计等。尽管有多重优点,但其中一个问题是数据振荡(即异常值或数据跳跃),由于它们可能扭曲移动模式并导致错误或有偏见的结论,因此不容忽视。对于以GPS数据驱动的交通运输研究,通过消除由数据振荡引起的噪音来确保数据质量无疑是非常重要的。 大多数基于GPS的研究只是通过检查高速来消除振荡。但是,这种方法可能会将正常点错误地识别为振荡。其他一些研究专门讨论了GPS数据中的异常值去除,但它们都存在局限性,不适用于被动收集的GPS数据。许多研究针对的是电信数据或电信基站数据中的乒乓现象,但被动收集的GPS数据中的振荡由于具有更多的各种复杂模式而更加不确定。当前的方法对于被动收集的GPS数据是不够充分的和不适用的。本文旨在解决被动式GPS定位数据中的振荡点的问题。通过识别振荡的异常移动模式,提出了一组启发式方法。所提供的启发式方法适用于不同尺度的研究,并适应于被动式GPS定位数据的特征,与当前的方法相比,在计算成本方面具有更好的效果。