Multimodal deep learning systems are deployed in dynamic scenarios due to the robustness afforded by multiple sensing modalities. Nevertheless, they struggle with varying compute resource availability (due to multi-tenancy, device heterogeneity, etc.) and fluctuating quality of inputs (from sensor feed corruption, environmental noise, etc.). Statically provisioned multimodal systems cannot adapt when compute resources change over time, while existing dynamic networks struggle with strict compute budgets. Additionally, both systems often neglect the impact of variations in modality quality. Consequently, modalities suffering substantial corruption may needlessly consume resources better allocated towards other modalities. We propose ADMN, a layer-wise Adaptive Depth Multimodal Network capable of tackling both challenges: it adjusts the total number of active layers across all modalities to meet strict compute resource constraints and continually reallocates layers across input modalities according to their modality quality. Our evaluations showcase ADMN can match the accuracy of state-of-the-art networks while reducing up to 75% of their floating-point operations.
翻译:多模态深度学习系统凭借多传感模态提供的鲁棒性,常被部署于动态场景中。然而,这些系统面临着计算资源可用性变化(由多租户、设备异构性等因素引起)以及输入质量波动(源于传感器数据流损坏、环境噪声等)的挑战。静态配置的多模态系统无法在计算资源随时间变化时进行自适应调整,而现有的动态网络则难以满足严格的计算预算约束。此外,这两类系统往往忽略了模态质量变化的影响,导致遭受严重损坏的模态可能不必要地消耗资源,而这些资源本可更有效地分配给其他模态。为此,我们提出ADMN,一种逐层自适应深度多模态网络,能够同时应对上述挑战:它通过调整所有模态中激活层的总数以满足严格的计算资源约束,并依据各输入模态的质量持续重新分配层间资源。实验评估表明,ADMN在保持与最先进网络相当精度的同时,可减少高达75%的浮点运算量。