Supporting the current trend in the AI community, we present the AI Journey 2021 Challenge called Fusion Brain, the first competition which is targeted to make the universal architecture which could process different modalities (in this case, images, texts, and code) and solve multiple tasks for vision and language. The Fusion Brain Challenge combines the following specific tasks: Code2code Translation, Handwritten Text recognition, Zero-shot Object Detection, and Visual Question Answering. We have created datasets for each task to test the participants' submissions on it. Moreover, we have collected and made publicly available a new handwritten dataset in both English and Russian, which consists of 94,128 pairs of images and texts. We also propose a multimodal and multitask architecture - a baseline solution, in the center of which is a frozen foundation model and which has been trained in Fusion mode along with Single-task mode. The proposed Fusion approach proves to be competitive and more energy-efficient compared to the task-specific one.
翻译:支持AI社区目前的趋势,我们介绍AI Journey 2021挑战,称为“融合大脑”,这是第一次旨在建立通用架构,处理不同模式(即图像、文本和代码)和解决视觉和语言的多重任务的全球架构的竞争。“融合大脑挑战”结合了以下具体任务:代码翻译、手写文本识别、零射物体探测和视觉问答。我们为每项任务创建了数据集,以测试参与者对它提交的文件。此外,我们收集并公开提供了一套新的英文和俄文手写数据集,由94,128对图像和文本组成。我们还提出了一个多式和多功能结构――一个基线解决方案,其中心是一个冻结的基础模型,在组合模式和单一任务模式下接受了培训。拟议的“融合”方法证明与具体任务模式相比,具有竞争力和更高的能效。