To build an artificial neural network like the biological intelligence system, recent works have unified numerous tasks into a generalist model, which can process various tasks with shared parameters and do not have any task-specific modules. While generalist models achieve promising results on various benchmarks, they have performance degradation on some tasks compared with task-specialized models. In this work, we find that interference among different tasks and modalities is the main factor to this phenomenon. To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Routing strategies under different levels of conditions are proposed to take both the training/inference cost and generalization ability into account. By incorporating the proposed Conditional MoEs, the recently proposed generalist model Uni-Perceiver can effectively mitigate the interference across tasks and modalities, and achieves state-of-the-art results on a series of downstream tasks via prompt tuning on 1% of downstream data. Moreover, the introduction of Conditional MoEs still holds the generalization ability of generalist models to conduct zero-shot inference on new tasks, e.g., video-text retrieval and video caption. Code and pre-trained generalist models shall be released.
翻译:为了建立生物情报系统等人工神经网络,最近的工程将许多任务统一成一个通用模型,该模型可以处理各种任务,具有共同参数,而且没有任何特定任务模块。虽然通用模型在各种基准上取得了有希望的成果,但与任务专门模型相比,这些模型在某些任务上的表现却不如任务专门模型。在这项工作中,我们发现不同任务和模式之间的干扰是这一现象的主要因素。为了减轻这种干扰,我们引入了通用模型,可以处理各种任务,这种模型可以处理不同参数的不同任务,而没有任何特定任务模块。提议在不同条件下制定战略,以兼顾培训/推断成本和一般化能力。通过纳入拟议的通用模型,最近提议的通用模型Uniercer能够有效减轻任务和模式之间的干扰,并通过对下游数据1%的迅速调整,在一系列下游任务上取得最先进的结果。此外,引入通用模型仍然具备在各种条件下进行培训/推断成本和一般能力,从而将通用模型的通用能力纳入考虑之中。通过纳入拟议的有条件模型,最近提出的通用模型可以有效减轻任务和模式之间的干扰,从而在一系列下游任务上取得最先进的状态结果结果。关于新的录像模型和升级的版本。