The recent explosion of interest in multimodal applications has resulted in a wide selection of datasets and methods for representing and integrating information from different signals. Despite these empirical advances, there remain fundamental research questions: how can we quantify the nature of interactions that exist among input features? Subsequently, how can we capture these interactions using suitable data-driven methods? To answer this question, we propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy across input features, which we term the PID statistics of a multimodal distribution. Using 2 newly proposed estimators that scale to high-dimensional distributions, we demonstrate their usefulness in quantifying the interactions within multimodal datasets, the nature of interactions captured by multimodal models, and principled approaches for model selection. We conduct extensive experiments on both synthetic datasets where the PID statistics are known and on large-scale multimodal benchmarks where PID estimation was previously impossible. Finally, to demonstrate the real-world applicability of our approach, we present three case studies in pathology, mood prediction, and robotic perception where our framework accurately recommends strong multimodal models for each application.
翻译:最近对多式联运应用的兴趣激增,导致广泛选择数据集和方法,用以代表并整合不同信号的信息。尽管取得了这些经验性进展,但仍存在根本性的研究问题:我们如何量化投入特征之间相互作用的性质?随后,我们如何利用适当的数据驱动方法捕捉这些相互作用?为了回答这一问题,我们提议采用信息理论方法,量化多种模式分布的PID统计数据的冗余程度、独特性和协同作用。我们用2个新提议的测算器来表示和整合来自不同信号的信息。我们用2个新提出的测算器来显示多模式数据集中的相互作用、多式联运模型所捕捉的互动性质和模式选择的有原则性。我们广泛试验了已知PID统计数据的合成数据集和以前无法估计PID的大规模多式联运基准。最后,为了显示我们的方法在现实世界中的可适用性,我们在病理学、情绪预测和机器人认知方面的3个案例研究,我们的框架准确地建议了每项应用的强有力的多式模型。