TranVVT: 识别过境旅行时间范围预测的综合远景变异框架 (TranViT: An Integrated Vision Transformer Framework for Discrete Transit Travel Time Range Prediction)

Accurate travel time estimation is paramount for providing transit users with reliable schedules and dependable real-time information. This paper proposes and evaluates a novel end-to-end framework for transit and roadside image data acquisition, labeling, and model training to predict transit travel times across a segment of interest. General Transit Feed Specification (GTFS) real-time data is used as an activation mechanism for a roadside camera unit monitoring a segment of Massachusetts Avenue in Cambridge, MA. Ground truth labels are generated for the acquired images based on the observed travel time percentiles across the monitored segment obtained from Automated Vehicle Location (AVL) data. The generated labeled image dataset is then used to train and evaluate a Vision Transformer (ViT) model to predict a discrete transit travel time range (band). The results of this exploratory study illustrate that the ViT model is able to learn image features and contents that best help it deduce the expected travel time range with an average validation accuracy ranging between 80%-85%. We also demonstrate how this discrete travel time band prediction can subsequently be utilized to improve continuous transit travel time estimation. The workflow and results presented in this study provide an end-to-end, scalable, automated, and highly efficient approach for integrating traditional transit data sources and roadside imagery to improve the estimation of transit travel duration. This work also demonstrates the value of incorporating real-time information from computer-vision sources, which are becoming increasingly accessible and can have major implications for improving operations and passenger real-time information.

翻译：准确的旅行时间估计对于向过境用户提供可靠的时间表和可靠的实时信息至关重要。本文件提议和评价一个全新的过境和路边图像数据采集、标签和模型培训端对端框架,以预测跨部分感兴趣的过境旅行时间。一般过境进料规格(GTFS)实时数据被用作路边照相机的启动机制,监测马萨诸塞大道位于剑桥的马萨诸塞大道段段,MA。根据从自动车辆定位(AVL)数据获得的监测部分所观察到的可无障碍旅行时间百分数,为获得的过境图像制作了地面真实标签。随后,制作的贴有标签的图像数据集被用于培训和评价愿景转换器模型,以预测不同过境旅行时间范围(波段)的跨度。这一探索性研究结果表明,VIT模型能够学习到图像特征和内容,从而最有助于推导出预期的旅行时间范围,平均验证准确度在80%-85%之间。我们还演示如何随后利用这种离散旅行时间段预测来改进不断的过境时间估计。这一动态图像数据集和结果也越来越多地用于将主流的过境时间和高端估算。这项研究中所提出的数据流流流数据流到高版本,从而改进了对路路路路段和高端估算。