Recently, Self-Supervised Representation Learning (SSRL) has attracted much attention in the field of computer vision, speech, natural language processing (NLP), and recently, with other types of modalities, including time series from sensors. The popularity of self-supervised learning is driven by the fact that traditional models typically require a huge amount of well-annotated data for training. Acquiring annotated data can be a difficult and costly process. Self-supervised methods have been introduced to improve the efficiency of training data through discriminative pre-training of models using supervisory signals that have been freely obtained from the raw data. Unlike existing reviews of SSRL that have pre-dominately focused upon methods in the fields of CV or NLP for a single modality, we aim to provide the first comprehensive review of multimodal self-supervised learning methods for temporal data. To this end, we 1) provide a comprehensive categorization of existing SSRL methods, 2) introduce a generic pipeline by defining the key components of a SSRL framework, 3) compare existing models in terms of their objective function, network architecture and potential applications, and 4) review existing multimodal techniques in each category and various modalities. Finally, we present existing weaknesses and future opportunities. We believe our work develops a perspective on the requirements of SSRL in domains that utilise multimodal and/or temporal data
翻译:最近,自我监督的代表制学习(SSRL)在计算机愿景、语言、自然语言处理(NLP)领域吸引了大量关注,最近还利用其他类型的模式,包括传感器的时间序列,在计算机愿景、语言、自然语言处理(NLP)领域吸引了大量关注。自我监督学习的普及是由于传统模式通常需要大量附有良好说明的培训数据。获取附加说明的数据可能是一个困难和昂贵的过程。自我监督的方法已经采用,通过使用从原始数据中自由获得的监督信号对模型进行歧视性培训来提高培训数据的效率。与现有的SSRL审查不同,即预先侧重于CV或NLP领域单一模式的方法,我们的目标是为时间数据提供多式自我监督学习方法的第一次全面审查。为此,我们1)对现有SSRL方法进行全面分类,2 通过界定SSRL框架的关键组成部分,引入通用管道,3 比较现有模式的客观功能、网络架构和潜在应用,4) 与现有的SSRL领域的现有模式相比,从每个类别和各种模式的角度审查我们目前认为的时空空间和今后需要。