《过境旅行时间预测:利用路边城市形象的终点至终点框架》 (Computer Vision for Transit Travel Time Prediction: An End-to-End Framework Using Roadside Urban Imagery)

Accurate travel time estimation is paramount for providing transit users with reliable schedules and dependable real-time information. This paper is the first to utilize roadside urban imagery for direct transit travel time prediction. We propose and evaluate an end-to-end framework integrating traditional transit data sources with a roadside camera for automated roadside image data acquisition, labeling, and model training to predict transit travel times across a segment of interest. First, we show how the GTFS real-time data can be utilized as an efficient activation mechanism for a roadside camera unit monitoring a segment of interest. Second, AVL data is utilized to generate ground truth labels for the acquired images based on the observed transit travel time percentiles across the camera-monitored segment during the time of image acquisition. Finally, the generated labeled image dataset is used to train and thoroughly evaluate a Vision Transformer (ViT) model to predict a discrete transit travel time range (band). The results illustrate that the ViT model is able to learn image features and contents that best help it deduce the expected travel time range with an average validation accuracy ranging between 80%-85%. We assess the interpretability of the ViT model's predictions and showcase how this discrete travel time band prediction can subsequently improve continuous transit travel time estimation. The workflow and results presented in this study provide an end-to-end, scalable, automated, and highly efficient approach for integrating traditional transit data sources and roadside imagery to improve the estimation of transit travel duration. This work also demonstrates the value of incorporating real-time information from computer-vision sources, which are becoming increasingly accessible and can have major implications for improving operations and passenger real-time information.

翻译：准确的旅行时间估计对于向过境用户提供可靠的时间表和可靠的实时信息至关重要。本文是第一个使用路边城市图像进行直接过境旅行时间预测的文件中的第一个文件。我们提议并评价一个端对端框架,将传统过境数据源与路边图像自动采集、标签和模型培训相结合,用于预测路边图像数据,以预测不同部分的过境旅行时间。首先,我们展示如何利用GTFS实时数据,作为路边传统摄影机股监测一段兴趣部分的高效启动机制。其次, AVL数据用于根据观察到的过境影响值为获得的图像制作地面真相标签。在获取图像期间,我们提议并评价一个端对路边图像数据源使用路边摄像机进行整合的端到端框架。最后,制作的贴标签图像数据集被用于培训和彻底评价一个视野变异模型,以预测离散的过境旅行时间范围(带)。结果表明,ViT的实时数据模型能够从可获取的图像特征和内容,从而最有助于推断出预期的旅行时间范围,平均验证准确度在80%-85%之间,在摄像头监测到摄像路边摄像路路段期间,我们评估如何对旅行进行快速时间进行解读,从而进行快速预测。将不断改进旅行结果, 不断更新和持续地展示, 不断更新的路径和持续地展示,不断更新地展示,不断改进旅行的路径的路径,不断更新的路径,不断更新的路径和不断更新的路径,不断更新的路径对路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路程。