Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction. At the core of OFASys is the idea of decoupling multi-modal task representations from the underlying model implementations. In OFASys, a task involving multiple modalities can be defined declaratively even with just a single line of code. The system automatically generates task plans from such instructions for training and inference. It also facilitates multi-task training for diverse multi-modal workloads. As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data. The single OFA+ model achieves 95% performance in average with only 16% parameters of 15 task-finetuned models, showcasing the performance reliability of multi-modal task-scaling provided by OFASys. Available at https://github.com/OFA-Sys/OFASys
翻译:通用模型能够在一个单一的模型内以任务-不可知的方式执行多种多模式任务,最近已经探索了这些模型。希望作为接近通用AI的替代方法,现有的通用模型仍处于早期阶段,其模式和任务范围有限。为了赋予多模式任务缩放权力并加快这一研究线,我们发布了一个通用模型学习系统,即OFASys,它建在一个称为多模式教学的宣言性任务界面之上。在OFASy的核心是将多模式任务表示与基本模型执行脱钩的想法。在OFASys中,涉及多种模式的任务即使只是单行代码也可以以宣示的方式加以界定。这个系统自动根据这种指令生成任务计划,用于培训和推断。我们还为多种模式的多模式培训提供了多模式培训。作为一个起点,我们仅提供7种不同模式的预设和23种高度多样化的多模式任务。在OAISys中,我们还在第一个模型-inal-al-al-al-al-al-al-always 中开发了第一个-inal-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-s ex-s ex-s ex-axal-s ex-s ex-s sal-s ex-s ex-s ex-lviolviolviolval ex-s ex-s ex-axxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx