The online emergence of multi-modal sharing platforms (eg, TikTok, Youtube) is powering personalized recommender systems to incorporate various modalities (eg, visual, textual and acoustic) into the latent user representations. While existing works on multi-modal recommendation exploit multimedia content features in enhancing item embeddings, their model representation capability is limited by heavy label reliance and weak robustness on sparse user behavior data. Inspired by the recent progress of self-supervised learning in alleviating label scarcity issue, we explore deriving self-supervision signals with effectively learning of modality-aware user preference and cross-modal dependencies. To this end, we propose a new Multi-Modal Self-Supervised Learning (MMSSL) method which tackles two key challenges. Specifically, to characterize the inter-dependency between the user-item collaborative view and item multi-modal semantic view, we design a modality-aware interactive structure learning paradigm via adversarial perturbations for data augmentation. In addition, to capture the effects that user's modality-aware interaction pattern would interweave with each other, a cross-modal contrastive learning approach is introduced to jointly preserve the inter-modal semantic commonality and user preference diversity. Experiments on real-world datasets verify the superiority of our method in offering great potential for multimedia recommendation over various state-of-the-art baselines. The implementation is released at: https://github.com/HKUDS/MMSSL.
翻译:多模式共享平台(例如,TikTok、Youtube)的在线出现使个人化建议系统能够将各种模式(例如,视觉、文字和声学)纳入潜在的用户代表中。虽然目前关于多模式建议的工作利用多媒体内容特点加强项目嵌入,但其示范代表能力因标签依赖程度高,用户行为数据少而不够强力而受到限制。受最近自我监督学习缓解标签稀缺问题的进展的启发,我们探索通过有效学习模式认知用户偏好和跨模式依赖性来生成自我监督信号。为此,我们提议采用新的多模式自定义和超模学习(MMSL)方法(MMSL)处理两项关键挑战。具体地说,为描述用户-项目协作性观点和多模式用户行为数据行为数据行为数据行为数据的可靠性,我们设计一种模式自觉互动结构学习模式模式模式,通过对数据增强的对模式用户偏好用户偏好和超模式依赖性模式。此外,我们每个模式互动模式在最大用户-超模范度模式-超度方法下,相互对比在最大模型/超常化模式-超常化方法上,可以相互对比。