Harnessing the benefits of drones for urban innovation at scale requires reliable aerial autonomy. One major barrier to advancing aerial autonomy has been collecting large-scale aerial datasets for training machine learning models. Due to costly and time-consuming real-world data collection through deploying drones, there has been an increasing shift towards using synthetic data for training models in drone applications. However, to increase generalizability of trained policies on synthetic data, incorporating domain randomization into the data generation workflow for addressing the sim-to-real problem becomes crucial. Current synthetic data generation tools either lack domain randomization or rely heavily on manual workload or real samples for configuring and generating diverse realistic simulation scenes. These dependencies limit scalability of the data generation workflow. Accordingly, there is a major challenge in balancing generalizability and scalability in synthetic data generation. To address these gaps, we introduce a modular scalable data generation workflow tailored to aerial autonomy applications. To generate realistic configurations of simulation scenes while increasing diversity, we present an adaptive layered domain randomization approach that creates a type-agnostic distribution space for assets over the base map of the environments before pose generation for drone trajectory. We leverage high-level scene structures to automatically place assets in valid configurations and then extend the diversity through obstacle generation and global parameter randomization. We demonstrate the effectiveness of our method in automatically generating diverse configurations and datasets and show its potential for downstream performance optimization. Our work contributes to generating enhanced benchmark datasets for training models that can generalize better to real-world situations.
翻译:将无人驾驶飞机的好处用于规模城市创新,需要有可靠的空中自主性。推进空中自主的一个主要障碍是收集大规模空中数据集,用于培训机器学习模式。由于通过部署无人驾驶飞机收集了昂贵和耗时的实际世界数据收集,因此在使用无人驾驶飞机应用培训模型时,越来越倾向于使用合成数据;然而,为了提高经过培训的合成数据政策的普遍性,将域随机化纳入数据生成工作流程,以解决模拟到现实的问题,这就变得至关重要。目前的合成数据生成工具要么缺乏域随机化,要么严重依赖手工工作量或实际样本来配置和生成各种现实的模拟场景。这些依赖性限制了数据生成工作流程的可缩放性。因此,在平衡合成数据生成模型的通用性和可缩放性方面出现了重大挑战。为填补这些空白,我们引入了模块化的可缩放数据生成流程,为模拟场景创造现实的配置,我们提出了一种适应性、多层域随机化的域随机配置方法,为在构建通用环境基础地图上的资产创建一个型号分布空间,并创造出更精确的版图,从而在生成更精确的版图之前,我们的数据生成更精确的版图,从而展示更精确地展示数据结构,从而展示数据生成更精确地展示数据生成数据生成系统。