This paper studies the multi-modal recommendation problem, where the item multi-modality information (eg. images and textual descriptions) is exploited to improve the recommendation accuracy. Besides the user-item interaction graph, existing state-of-the-art methods usually use auxiliary graphs (eg. user-user or item-item relation graph) to augment the learned representations of users and/or items. These representations are often propagated and aggregated on auxiliary graphs using graph convolutional networks, which can be prohibitively expensive in computation and memory, especially for large graphs. Moreover, existing multi-modal recommendation methods usually leverage randomly sampled negative examples in Bayesian Personalized Ranking (BPR) loss to guide the learning of user/item representations, which increases the computational cost on large graphs and may also bring noisy supervision signals into the training process. To tackle the above issues, we propose a novel self-supervised multi-modal recommendation model, dubbed BM3, which requires neither augmentations from auxiliary graphs nor negative samples. Specifically, BM3 first bootstraps latent contrastive views from the representations of users and items with a simple dropout augmentation. It then jointly optimizes three multi-modal objectives to learn the representations of users and items by reconstructing the user-item interaction graph and aligning modality features under both inter- and intra-modality perspectives. BM3 alleviates both the need for contrasting with negative examples and the complex graph augmentation from an additional target network for contrastive view generation. We show BM3 outperforms prior recommendation models on three datasets with number of nodes ranging from 20K to 200K, while achieving a 2-9X reduction in training time. Our code is available at https://github.com/enoche/BM3.
翻译:本文研究多式建议问题, 利用项目多式信息( 如图像和文本描述) 来提高建议准确性。 除了用户- 内容互动图外, 现有最先进的方法通常使用辅助图形( 如用户用户用户或项目项目关系图) 来增加用户和/或项目所学的表达方式。 这些表达方式通常在辅助图表上传播和汇总, 其计算和记忆中可能过于昂贵, 特别是大图表。 此外, 现有的多式建议方法通常利用巴伊西亚个性化排名(BPR)中随机抽样的负面实例来指导用户/项目表达方式的学习。 这增加了大式图表的计算成本, 也可能给培训过程带来噪音。 为了解决上述问题, 我们建议采用一个新的自上版的多式建议模式, 调制代码BM3, 既不需要从辅助性图表或负式样本中增加增量。 具体地, BM3 最初的双式对比式对比式观点, 从BM3 的用户和用户的演示方式下层分析中, 显示一个简单的用户互动模式, 以及一个简单的模拟项目的模拟,, 将它的模拟, 学习一个简单的模拟的模拟, 和模拟的模拟的模拟的模拟, 的模拟, 和模拟的模拟的模拟的模拟的模拟的模拟, 和模拟的模拟的模拟的模拟的模拟, 需要一个模拟的模拟的模拟的模拟的模拟, 和模拟的模拟的模拟的模拟的模拟的模拟的模拟, 。