Movies are a great source of entertainment. However, the problem arises when one is trying to find the desired content within this vast amount of data which is significantly increasing every year. Recommender systems can provide appropriate algorithms to solve this problem. The content_based technique has found popularity due to the lack of available user data in most cases. Content_based recommender systems are based on the similarity of items' demographic information; Term Frequency _ Inverse Document Frequency (TF_IDF) and Knowledge Graph Embedding (KGE) are two approaches used to vectorize data to calculate these similarities. In this paper, we propose a weighted content_based movie RS by combining TF_IDF which is an appropriate approach for embedding textual data such as plot/description, and KGE which is used to embed named entities such as the director's name. The weights between features are determined using a Genetic algorithm. Additionally, the Iranian movies dataset is created by scraping data from movie_related websites. This dataset and the structure of the FarsBase KG are used to create the MovieFarsBase KG which is a component in the implementation process of the proposed content_based RS. Using precision, recall, and F1 score metrics, this study shows that the proposed approach outperforms the conventional approach that uses TF_IDF for embedding all attributes.
翻译:电影是娱乐的伟大来源。 但是,当人们试图在大量数据中找到理想内容,而这些数据每年都在大量增加,就会出现问题。 推荐者系统可以提供适当的算法来解决这个问题。 基于内容的技术在多数情况下由于缺乏可用的用户数据而发现受欢迎。 基于内容的推荐者系统基于物品的人口信息相似性; 定期频率 _ 反页频率 (TF_IDF) 和知识图形嵌入(KGE) 是用来对数据进行传导以计算这些相似性的两种方法。 在本文中,我们建议使用一个加权内容_ 基于内容的RS RS 来创建一个加权的电影。 TF_ IDF 是嵌入文本数据(如绘图/描述)的适当方法,而KGE用来嵌入诸如主任姓名等命名实体的KGE。各功能之间的权重是使用基因算算法的。 此外,伊朗电影数据集是通过从电影相关网站收集数据来创建的。 这个数据集和FGG的结构用来创建MeepherFarsBase KG, 这是嵌 Fbase的精度方法的一部分, 用来显示常规内容的精度, 以及基于格式的方法 。