Recommender Systems have shown to be an effective way to alleviate the over-choice problem and provide accurate and tailored recommendations. However, the impressive number of proposed recommendation algorithms, splitting strategies, evaluation protocols, metrics, and tasks, has made rigorous experimental evaluation particularly challenging. Puzzled and frustrated by the continuous recreation of appropriate evaluation benchmarks, experimental pipelines, hyperparameter optimization, and evaluation procedures, we have developed an exhaustive framework to address such needs. Elliot is a comprehensive recommendation framework that aims to run and reproduce an entire experimental pipeline by processing a simple configuration file. The framework loads, filters, and splits the data considering a vast set of strategies (13 splitting methods and 8 filtering approaches, from temporal training-test splitting to nested K-folds Cross-Validation). Elliot optimizes hyperparameters (51 strategies) for several recommendation algorithms (50), selects the best models, compares them with the baselines providing intra-model statistics, computes metrics (36) spanning from accuracy to beyond-accuracy, bias, and fairness, and conducts statistical analysis (Wilcoxon and Paired t-test). The aim is to provide the researchers with a tool to ease (and make them reproducible) all the experimental evaluation phases, from data reading to results collection. Elliot is available on GitHub (https://github.com/sisinflab/elliot).
翻译:建议系统显示,它是缓解过度选择问题的有效途径,提供了准确和有针对性的建议;然而,拟议建议算法、分拆战略、评价程序、衡量尺度和任务的数量之多,使严格的实验性评价特别具有挑战性;由于不断利用适当的评价基准、试验管道、超光谱优化和评价程序而遭到质疑和挫折;我们制定了满足这些需要的详尽框架;艾略特是一个全面的建议框架,旨在通过处理一个简单的配置文件来运行和复制整个试验管道;框架负荷、过滤器和数据分开,考虑到一套广泛的战略(13个分解方法和8个过滤方法,从时间培训测试分解到嵌套K圆形交叉估价);埃利奥特优化了若干建议算法(50个)的超参数(51个战略),选择了最佳模型,将其与提供内部统计数据的基准进行比较,计算尺度(36个从准确度到超出准确度、偏差和公平性,并进行统计分析(Wilcoxon和PaireadHt-vale),目标是从易读/toliot所有实验工具收集结果。