Visual question answering (VQA) that leverages multi-modality data has attracted intensive interest in real-life applications, such as home robots and clinic diagnoses. Nevertheless, one of the challenges is to design robust learning for different client tasks. This work aims to bridge the gap between the prerequisite of large-scale training data and the constraint of client data sharing mainly due to confidentiality. We propose the Unidirectional Split Learning with Contrastive Loss (UniCon) to tackle VQA tasks training on distributed data silos. In particular, UniCon trains a global model over the entire data distribution of different clients learning refined cross-modal representations via contrastive learning. The learned representations of the global model aggregate knowledge from different local tasks. Moreover, we devise a unidirectional split learning framework to enable more efficient knowledge sharing. The comprehensive experiments with five state-of-the-art VQA models on the VQA-v2 dataset demonstrated the efficacy of UniCon, achieving an accuracy of 49.89% in the validation set of VQA-v2. This work is the first study of VQA under the constraint of data confidentiality using self-supervised Split Learning.
翻译:利用多模式数据的视觉解答(VQA)吸引了人们对实际生活应用的浓厚兴趣,例如家用机器人和诊所诊断。然而,挑战之一是为不同的客户任务设计强有力的学习。这项工作旨在弥合大规模培训数据的先决条件与主要由于保密而使客户数据共享受到限制之间的鸿沟。我们建议用VQA-v2数据集的VQA-VQA系列单向分解学习(Unicon)解决VQA关于分布式数据库的任务培训。特别是,Uncon为不同客户的整个数据分配培训了一个全球模型,通过对比性学习学习完善的跨模式表现。从不同地方任务中学习的全球模型综合知识的学习表现。此外,我们设计了一个单一方向分解学习框架,以便更有效地分享知识。我们用VQA-v2数据集的五种状态的VQA模型进行了全面实验,展示了Unico公司的效率,在VQA-v2的校准数据集中实现了49.89%的准确率。这是VQA的首次研究,这是在使用自我监督的保密性学习数据中受限制的情况下对数据进行分解的保密性研究。