In this paper we describe our work towards building a generic framework for both multi-modal embedding and multi-label binary classification tasks, while participating in task 5 (Multimedia Automatic Misogyny Identification) of SemEval 2022 competition. Since pretraining deep models from scratch is a resource and data hungry task, our approach is based on three main strategies. We combine different state-of-the-art architectures to capture a wide spectrum of semantic signals from the multi-modal input. We employ a multi-task learning scheme to be able to use multiple datasets from the same knowledge domain to help increase the model's performance. We also use multiple objectives to regularize and fine tune different system components.
翻译:本文介绍我们为多模式嵌入和多标签二进制分类任务建立通用框架的工作,同时参与SemEval 2022 竞争任务5(Multimedia Autimosogyny 识别) 。由于从零开始深层次模型的训练是一项资源和数据饥饿的任务,我们的方法基于三大战略。我们结合了各种最先进的结构,以捕捉来自多模式投入的多种语义信号。我们采用了多任务学习计划,以便能够利用同一知识领域的多个数据集来帮助提高模型的性能。我们还利用多重目标来规范和调整不同的系统组成部分。