多媒体建议中具有相互抵触的多种模式拆散作用的中层结构采矿 (Latent Structures Mining with Contrastive Modality Fusion for Multimedia Recommendation)

Recent years have witnessed growing interests in multimedia recommendation, which aims to predict whether a user will interact with an item with multimodal contents. Previous studies focus on modeling user-item interactions with multimodal features included as side information. However, this scheme is not well-designed for multimedia recommendation. Firstly, only collaborative item-item relationships are implicitly modeled through high-order item-user-item co-occurrences. We argue that the latent semantic item-item structures underlying these multimodal contents could be beneficial for learning better item representations and assist the recommender models to comprehensively discover candidate items. Secondly, previous studies disregard the fine-grained multimodal fusion. Although having access to multiple modalities might allow us to capture rich information, we argue that the simple coarse-grained fusion by linear combination or concatenation in previous work is insufficient to fully understand content information and item relationships.To this end, we propose a latent structure MIning with ContRastive mOdality fusion method (MICRO for brevity). To be specific, we devise a novel modality-aware structure learning module, which learns item-item relationships for each modality. Based on the learned modality-aware latent item relationships, we perform graph convolutions that explicitly inject item affinities to modality-aware item representations. Then, we design a novel contrastive method to fuse multimodal features. These enriched item representations can be plugged into existing collaborative filtering methods to make more accurate recommendations. Extensive experiments on real-world datasets demonstrate the superiority of our method over state-of-the-art baselines.

翻译：近年来,人们对多媒体建议的兴趣日益浓厚,因为多媒体建议的目的是预测用户是否会与含有多式联运内容的项目发生互动。先前的研究侧重于模拟用户-项目互动,而将多式联运特性作为附带信息。然而,这一办法的设计并不适合多媒体建议。首先,只有合作项目-项目关系通过高阶项目-用户-用户-项目共发生事件而暗含模式。我们争辩说,这些多式联运内容背后的潜在语义性项目项结构有助于学习更好的项目表达方式,并帮助推荐人模型全面发现候选项目。第二,以往的研究忽视了细微的多式联运组合。尽管获得多种模式的精确组合,可能使我们能够获取丰富的信息。但我们认为,通过线性组合或以前工作中的混凝在一起的简单粗粗的混杂项目关系不足以充分理解内容信息和项目关系。为此,我们建议采用“多动性聚合组合”组合方法(MICRO for reviventionality)来调节这些内容。具体地说,我们设计了一个新模式结构结构学习模块-结构模块,在每一个模式中学习真实的项目-项目关系。我们所学的模型将一个更深层次的模型到一个我们所学的模型。