Most of the existing recommender systems are based only on the rating data, and they ignore other sources of information that might increase the quality of recommendations, such as textual reviews, or user and item characteristics. Moreover, the majority of those systems are applicable only on small datasets (with thousands of observations) and are unable to handle large datasets (with millions of observations). We propose a recommender algorithm that combines a rating modelling technique (i.e., Latent Factor Model) with a topic modelling method based on textual reviews (i.e., Latent Dirichlet Allocation), and we extend the algorithm such that it allows adding extra user- and item-specific information to the system. We evaluate the performance of the algorithm using Amazon.com datasets with different sizes, corresponding to 23 product categories. After comparing the built model to four other models we found that combining textual reviews with ratings leads to better recommendations. Moreover, we found that adding extra user and item features to the model increases its prediction accuracy, which is especially true for medium and large datasets.
翻译:现有建议者系统大多仅以评级数据为基础,它们忽视了其他可能提高建议质量的信息来源,如文字审查或用户和项目特性;此外,这些系统大多只适用于小数据集(有数千项观察),无法处理大数据集(有数百万项观察);我们建议一种推荐者算法,将评级建模技术(即 " 后端因子模型 " )与基于文本审查的专题建模方法(即 " 中端 Dirichlet分配 " )结合起来,我们扩大算法,允许将额外的用户和项目特定信息添加到系统中;我们利用不同大小的亚马逊.com数据集评估算法的性能,对应23个产品类别;在将已建模型与另外4个模型进行比较后,我们发现将文字审查与评级相结合后,可以得出更好的建议;此外,我们发现在模型中、大数据集中增加用户和项目特征会提高预测准确性,尤其是如此。