LiDAR-based perception in intelligent transportation systems (ITS) relies on deep neural networks trained with large-scale labeled datasets. However, creating such datasets is expensive, time-consuming, and labor-intensive, limiting the scalability of perception systems. Sim2Real learning offers a scalable alternative, but its success depends on the simulation's fidelity to real-world environments, dynamics, and sensors. This tutorial introduces a reproducible workflow for building high-fidelity digital twins (HiFi DTs) to generate realistic synthetic datasets. We outline practical steps for modeling static geometry, road infrastructure, and dynamic traffic using open-source resources such as satellite imagery, OpenStreetMap, and sensor specifications. The resulting environments support scalable and cost-effective data generation for robust Sim2Real learning. Using this workflow, we have released three synthetic LiDAR datasets, namely UT-LUMPI, UT-V2X-Real, and UT-TUMTraf-I, which closely replicate real locations and outperform real-data-trained baselines in perception tasks. This guide enables broader adoption of HiFi DTs in ITS research and deployment.
翻译:智能交通系统中的LiDAR感知依赖于基于大规模标注数据集训练的深度神经网络。然而,创建此类数据集成本高昂、耗时费力,限制了感知系统的可扩展性。仿真到真实学习提供了一种可扩展的替代方案,但其成功取决于仿真环境在真实世界场景、动态特性与传感器特性方面的保真度。本教程介绍了一种可复现的工作流程,用于构建高保真数字孪生以生成逼真的合成数据集。我们概述了利用卫星影像、OpenStreetMap及传感器规格等开源资源,对静态几何结构、道路基础设施及动态交通进行建模的实用步骤。所构建的环境能够支持可扩展且经济高效的数据生成,从而促进鲁棒的仿真到真实学习。基于此工作流程,我们发布了三个合成LiDAR数据集——UT-LUMPI、UT-V2X-Real与UT-TUMTraf-I,这些数据集高度还原了真实场景,并在感知任务中超越了基于真实数据训练的基线模型。本指南将推动高保真数字孪生在智能交通系统研究与应用中的广泛采纳。