Traffic forecasting is an important element of mobility management, an important key that drives the logistics industry. Over the years, lots of work have been done in Traffic forecasting using time series as well as spatiotemporal dynamic forecasting. In this paper, we explore the use of vision transformer in a UNet setting. We completely remove all convolution-based building blocks in UNet, while using 3D shifted window transformer in both encoder and decoder branches. In addition, we experiment with the use of feature mixing just before patch encoding to control the inter-relationship of the feature while avoiding contraction of the depth dimension of our spatiotemporal input. The proposed network is tested on the data provided by Traffic Map Movie Forecasting Challenge 2021(Traffic4cast2021), held in the competition track of Neural Information Processing Systems (NeurIPS). Traffic4cast2021 task is to predict an hour (6 frames) of traffic conditions (volume and average speed)from one hour of given traffic state (12 frames averaged in 5 minutes time span). Source code is available online at https://github.com/bojesomo/Traffic4Cast2021-SwinUNet3D.
翻译:交通流量预测是流动管理的一个重要要素,是物流行业的重要动力。多年来,利用时间序列和时空动态预测,在交通流量预测方面做了大量工作。在本文中,我们探索在UNet环境下使用视觉变压器。我们完全删除了UNet的所有基于革命的构件,同时在编码器和解码器两个分支使用3D移动窗口变换变换变压器。此外,我们试验了在补丁编码之前使用特征混合器来控制该特征的相互关系,同时避免了我们spototootoporal投入深度的收缩。拟议中的网络根据《交通地图预测挑战2021》(Traffic4cast2021)提供的数据进行测试,该数据是在神经信息处理系统的竞争轨迹(NeurIPS)中保存的。流量42021的任务是从一个小时的交通状况(流量和平均速度)(平均速度为12个框架,在5分钟内)预测一个小时的交通状况(流量和平均速度)。源码可在https://github.com/bojesomat-Traffic4)上查到。