Fleets of networked autonomous vehicles (AVs) collect terabytes of sensory data, which is often transmitted to central servers (the ''cloud'') for training machine learning (ML) models. Ideally, these fleets should upload all their data, especially from rare operating contexts, in order to train robust ML models. However, this is infeasible due to prohibitive network bandwidth and data labeling costs. Instead, we propose a cooperative data sampling strategy where geo-distributed AVs collaborate to collect a diverse ML training dataset in the cloud. Since the AVs have a shared objective but minimal information about each other's local data distribution and perception model, we can naturally cast cooperative data collection as an $N$-player mathematical game. We show that our cooperative sampling strategy uses minimal information to converge to a centralized oracle policy with complete information about all AVs. Moreover, we theoretically characterize the performance benefits of our game-theoretic strategy compared to greedy sampling. Finally, we experimentally demonstrate that our method outperforms standard benchmarks by up to $21.9\%$ on 4 perception datasets, including for autonomous driving in adverse weather conditions. Crucially, our experimental results on real-world datasets closely align with our theoretical guarantees.
翻译:网络自治车辆(AVs)的车队收集了兆字节的感官数据,这些数据往往被传送到中央服务器(“cloud”),用于培训机器学习模式。理想的是,这些车队应上传所有数据,特别是稀有操作环境的数据,以便培训强大的ML模型。然而,由于网络带宽和数据标签成本高得令人望而却步,这是行不通的。相反,我们提议了一个合作数据取样战略,由地理分布式AVs合作收集云层中不同的ML培训数据集。由于AVs对彼此的本地数据分布和感知模型有着共同的目标,但信息极少,因此我们自然可以将合作数据收集作为美元玩家数学游戏。我们的合作采样战略使用最起码的信息将所有AVs的完整信息集中到中央或中央政策中。此外,我们从理论上描述我们的游戏感官策略与贪婪取样相比的性能效益。最后,我们实验性地证明我们的方法比标准基准高达21.9-10美分之元,用于4个感官感官数据采集的模型,包括自主的实验性地球空间数据。</s>