Multi-modal learning has emerged as a key technique for improving performance across domains such as autonomous driving, robotics, and reasoning. However, in certain scenarios, particularly in resource-constrained environments, some modalities available during training may be absent during inference. While existing frameworks effectively utilize multiple data sources during training and enable inference with reduced modalities, they are primarily designed for single-agent settings. This poses a critical limitation in dynamic environments such as connected autonomous vehicles (CAV), where incomplete data coverage can lead to decision-making blind spots. Conversely, some works explore multi-agent collaboration but without addressing missing modality at test time. To overcome these limitations, we propose Collaborative Auxiliary Modality Learning (CAML), a novel multi-modal multi-agent framework that enables agents to collaborate and share multi-modal data during training, while allowing inference with reduced modalities during testing. Experimental results in collaborative decision-making for CAV in accident-prone scenarios demonstrate that CAML achieves up to a 58.1% improvement in accident detection. Additionally, we validate CAML on real-world aerial-ground robot data for collaborative semantic segmentation, achieving up to a 10.6% improvement in mIoU.
翻译:多模态学习已成为提升自动驾驶、机器人学和推理等领域性能的关键技术。然而,在某些场景下,特别是在资源受限的环境中,训练期间可用的某些模态可能在推理阶段缺失。尽管现有框架在训练期间能有效利用多源数据,并支持以缩减模态进行推理,但这些框架主要针对单智能体场景设计。这在动态环境(如网联自动驾驶车辆)中构成关键限制,因为不完整的数据覆盖可能导致决策盲区。反之,部分研究探索了多智能体协作,但未解决测试时模态缺失的问题。为克服这些局限,我们提出协作式辅助模态学习(CAML),这是一种新颖的多模态多智能体框架,使智能体能够在训练期间协作共享多模态数据,同时在测试阶段支持以缩减模态进行推理。在事故易发场景下对网联自动驾驶车辆协作决策的实验结果表明,CAML在事故检测方面最高可提升58.1%的性能。此外,我们在真实世界空地机器人数据上验证了CAML在协作语义分割任务中的有效性,实现了最高10.6%的mIoU提升。