Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections. To accompany this benchmark, we also provide a standardized implementation of 20 core approaches in multimodal learning. Simply applying methods proposed in different research areas can improve the state-of-the-art performance on 9/15 datasets. Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal research and paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. MultiBench, our standardized code, and leaderboards are publicly available, will be regularly updated, and welcomes inputs from the community.
翻译:多式联运代表处是一个具有挑战性和关键意义的领域,在多媒体、情感计算、机器人、金融、人力-计算机互动和医疗保健领域应用了无数真实世界应用程序;不幸的是,多式联运研究的资源有限,无法研究:(1) 跨领域和模式的一般化,(2) 培训和推断期间的复杂性,(3) 对杂乱和缺失的模式的稳健性。为了加快在未得到充分研究的模式和任务方面取得进展,同时确保真实世界的稳健性,我们发布了多邦奇,一个系统和统一的大型基准领域,涵盖15个数据集、10个模式、20个预测任务和6个研究领域。多 Bennch提供了自动化的端对端机器学习管道,简化和规范了数据装载、试验设置和模型评价。为了能够进行整体评价,多邦奇提供了一种综合评估(1) 概括化、(2) 时间和空间复杂性和(3) 模式稳健性。多贝辛对未来研究提出了影响性挑战,包括大规模多式联运投入的可缩缩缩缩,以及稳健到现实的不完善性。为了配合这一基准,我们还提供了一个自动端对机端到端学习管道的自动升级的学习管道,我们还提供了一种标准化的研究方法。