Multimodal foundation models (MFMs) integrate diverse data modalities to support complex and wide-ranging tasks. However, this integration also introduces distinct safety and security challenges. In this paper, we unify the concepts of safety and security in the context of MFMs by identifying critical threats that arise from both model behavior and system-level interactions. We propose a taxonomy grounded in information theory, evaluating risks through the concepts of channel capacity, signal, noise, and bandwidth. This perspective provides a principled way to analyze how information flows through MFMs and how vulnerabilities can emerge across modalities. Building on this foundation, we introduce a deterministic minimax formulation to analyze defense mechanisms and expose structural vulnerabilities in multimodal systems. Our framework projects attacks onto the noise, signal, and bandwidth axes, collapsing the defense search space and mitigating defender asymmetry. Across 15 defenses, we find that system-level bandwidth and behavior constraints generalize substantially better than brittle model-only methods. Finally, we formalize an MFM "self-destruction threshold" that specifies when termination should be triggered, providing a concrete activation rule for circuit-breaker safeguards within multimodal systems.
翻译:多模态基础模型通过整合多样化的数据模态,以支持复杂且广泛的任务。然而,这种整合也引入了独特的安全与保障挑战。本文通过识别源自模型行为与系统级交互的关键威胁,将多模态基础模型背景下的安全与保障概念统一起来。我们提出一个基于信息论的分类框架,通过信道容量、信号、噪声与带宽等概念评估风险。这一视角为分析信息如何在多模态基础模型中流动,以及跨模态漏洞如何产生提供了原则性方法。在此基础上,我们引入确定性极小极大化模型来分析防御机制,并揭示多模态系统中的结构性漏洞。我们的框架将攻击投射至噪声、信号与带宽轴,从而压缩防御搜索空间并缓解防御方不对称性。通过对15种防御方法的评估,我们发现系统级带宽与行为约束相比脆弱的纯模型方法具有显著更好的泛化能力。最后,我们形式化定义了多模态基础模型的“自毁阈值”,该阈值明确了何时应触发终止机制,为多模态系统内的熔断保障措施提供了具体的激活规则。