This article presents a novel approach to multimodal recommendation systems, focusing on integrating and purifying multimodal data. Our methodology starts by developing a filter to remove noise from various types of data, making the recommendations more reliable. We studied the impact of top-K sparsification on different datasets, finding optimal values that strike a balance between underfitting and overfitting concerns. The study emphasizes the significant role of textual information compared to visual data in providing a deep understanding of items. We conducted sensitivity analyses to understand how different modalities and the use of purifier circle loss affect the efficiency of the model. The findings indicate that systems that incorporate multiple modalities perform better than those relying on just one modality. Our approach highlights the importance of modality purifiers in filtering out irrelevant data, ensuring that user preferences remain relevant. Models without modality purifiers showed reduced performance, emphasizing the need for effective integration of pre-extracted features. The proposed model, which includes an novel self supervised auxiliary task, shows promise in accurately capturing user preferences. The main goal of the fusion technique is to enhance the modeling of user preferences by combining knowledge with item information, utilizing sophisticated language models. Extensive experiments show that our model produces better results than the existing state-of-the-art multimodal recommendation systems.
翻译:暂无翻译