Side information of items, e.g., images and text description, has shown to be effective in contributing to accurate recommendations. Inspired by the recent success of pre-training models on natural language and images, we propose a pre-training strategy to learn item representations by considering both item side information and their relationships. We relate items by common user activities, e.g., co-purchase, and construct a homogeneous item graph. This graph provides a unified view of item relations and their associated side information in multimodality. We develop a novel sampling algorithm named MCNSampling to select contextual neighbors for each item. The proposed Pre-trained Multimodal Graph Transformer (PMGT) learns item representations with two objectives: 1) graph structure reconstruction, and 2) masked node feature reconstruction. Experimental results on real datasets demonstrate that the proposed PMGT model effectively exploits the multimodality side information to achieve better accuracies in downstream tasks including item recommendation, item classification, and click-through ratio prediction. We also report a case study of testing the proposed PMGT model in an online setting with 600 thousand users.
翻译:在自然语言和图像培训前模型最近取得成功的启发下,我们提议了一项培训前战略,通过考虑项目侧面信息及其关系来学习项目说明;我们通过共同用户活动,例如共同购买和构建一个同质项目图等,将项目联系起来;该图提供了对项目关系及其在多式联运方面的相关侧面信息的统一看法;我们开发了一个名为MCNSampling的新型抽样算法,以选择每个项目的背景邻居;拟议的预先培训多式图变换器(PMGT)学习项目说明,有两个目标:1)图形结构重建;和2)遮盖节点特征重建;实际数据集的实验结果显示,拟议的PMGT模型有效利用了多式侧面信息,以便更好地在下游任务中实现理解,包括项目建议、项目分类和点击-通比率预测;我们还报告了一项案例研究,在网上与60万用户一起测试拟议的PMGT模型。