While other areas of machine learning have seen more and more automation, designing a high-performing recommender system still requires a high level of human effort. Furthermore, recent work has shown that modern recommender system algorithms do not always improve over well-tuned baselines. A natural follow-up question is, "how do we choose the right algorithm for a new dataset and performance metric?" In this work, we start by giving the first large-scale study of recommender system approaches by comparing 18 algorithms and 100 sets of hyperparameters across 85 datasets and 315 metrics. We find that the best algorithms and hyperparameters are highly dependent on the dataset and performance metric, however, there are also strong correlations between the performance of each algorithm and various meta-features of the datasets. Motivated by these findings, we create RecZilla, a meta-learning approach to recommender systems that uses a model to predict the best algorithm and hyperparameters for new, unseen datasets. By using far more meta-training data than prior work, RecZilla is able to substantially reduce the level of human involvement when faced with a new recommender system application. We not only release our code and pretrained RecZilla models, but also all of our raw experimental results, so that practitioners can train a RecZilla model for their desired performance metric: https://github.com/naszilla/reczilla.
翻译:虽然机器学习的其他领域已经看到越来越多的自动化,但设计高性能推荐人系统仍需要高水平的人类努力。此外,最近的工作表明,现代推荐人系统算法并非总能超过经过良好调整的基线。自然的后续问题是,“我们如何为新的数据集和性能衡量标准选择正确的算法?”在这项工作中,我们首先对推荐人系统方法进行首次大规模研究,对85个数据集和315公制的18种算法和100套超参数进行比较比较。我们发现,最好的算法和超参数高度依赖数据集和性能衡量标准,然而,每个算法和数据集的各种元性功能之间也有很强的关联性关系。受这些发现的影响,我们创建了RECZilla,这是一种元学习方法,用以推荐人系统使用模型来预测最佳的算法和超参数,用于新的、看不见的数据集和315公制数据。我们发现,REZilla能够大大降低人类对数据集和性能测试模型的参与程度。我们无法在面临新的累进度测试系统时,我们只能使用比以前更多的元培训数据,我们用来发布新的累进系统。