This book is the result of a seminar in which we reviewed multimodal approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other, as well as models in which one modality is utilized to enhance representation learning for the other. To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced. Finally, we also cover other modalities as well as general-purpose multi-modal models, which are able to handle different tasks on different modalities within one unified architecture. One interesting application (Generative Art) eventually caps off this booklet.
翻译:这本书是一次研讨会的结果,我们在这次研讨会上审查了多式联运办法,并试图对实地进行坚实的概述,首先从深学习两个子领域的当前最新方法开始,然后逐个讨论一个模式转化为另一个模式的模型框架,以及使用一个模式加强另一个模式代表学习的模式。最后,在第二部分结束时,同时引入了侧重于处理两种模式的结构。最后,我们还涵盖了其他模式以及通用多模式模型,这些模式和模式能够在一个统一的架构内处理不同模式的不同任务。一个有趣的应用(General Art)最终将这本小册子封盖起来。