In our project, we focus on NLP-based hybrid recommendation systems. Our data is from Yelp Data. For our hybrid recommendation system, we have two major components: the first part is to embed the reviews with the Bert model and word2vec model; the second part is the implementation of an item-based collaborative filtering algorithm to compute the similarity of each review under different categories of restaurants. In the end, with the help of similarity scores, we are able to recommend users the most matched restaurant based on their recorded reviews. The coding work is split into several parts: selecting samples and data cleaning, processing, embedding, computing similarity, and computing prediction and error. Due to the size of the data, each part will generate one or more JSON files as the milestone to reduce the pressure on memory and the communication between each part.
翻译:在我们的项目中,我们侧重于基于NLP的混合建议系统。我们的数据来自Yelp Data。对于我们的混合建议系统,我们有两个主要组成部分:第一部分是将这些审查嵌入Bert模型和Word2vec模型;第二部分是实施基于项目的合作过滤算法,以计算不同类别的餐馆下每项审查的相似性。最后,在相似性评分的帮助下,我们能够根据所记录的评分向用户推荐最匹配的餐厅。编码工作分为几个部分:选择样本和数据清理、处理、嵌入、计算相似性以及计算预测和错误。由于数据大小,每个部分将产生一个或多个JSON文件,作为减少记忆压力和每个部分之间沟通的里程碑。