Context: The number of TV series offered nowadays is very high. Due to its large amount, many series are canceled due to a lack of originality that generates a low audience. Problem: Having a decision support system that can show why some shows are a huge success or not would facilitate the choices of renewing or starting a show. Solution: We studied the case of the series Arrow broadcasted by CW Network and used descriptive and predictive modeling techniques to predict the IMDb rating. We assumed that the theme of the episode would affect its evaluation by users, so the dataset is composed only by the director of the episode, the number of reviews that episode got, the percentual of each theme extracted by the Latent Dirichlet Allocation (LDA) model of an episode, the number of viewers from Wikipedia and the rating from IMDb. The LDA model is a generative probabilistic model of a collection of documents made up of words. Method: In this prescriptive research, the case study method was used, and its results were analyzed using a quantitative approach. Summary of Results: With the features of each episode, the model that performed the best to predict the rating was Catboost due to a similar mean squared error of the KNN model but a better standard deviation during the test phase. It was possible to predict IMDb ratings with an acceptable root mean squared error of 0.55.
翻译:答案: 我们研究了由CW网络播放的系列箭头, 并使用了描述性和预测性模型技术来预测IMDb的评级。 我们假设该插曲的主题会影响用户的评价, 因此数据集只由插曲的导演组成, 剧目的数量, 剧目获得的审查次数, 由Litetent Dirichlet分配(LDA)模型提取的每个主题的百分率, 维基百科的观众人数和IMDb的评分。 LDA模型是一个典型的典型的概率模型, 用于收集由文字组成的文件。 方法: 在这项规范性研究中, 使用了案例研究方法, 并且用定量方法对结果进行了分析。 结果摘要: 每集的特征, 由Litetent Dirichlet分配(LDA)模型提取的每个主题的百分率, 能够显示某些节目的成功成功与否, 维基百科的观众人数和IMDB的评级。 LDA模型是一个可以被接受的模型, 可以用来预测到平方的。