Recommender Systems today are still mostly evaluated in terms of accuracy, with other aspects beyond the immediate relevance of recommendations, such as diversity, long-term user retention and fairness, often taking a back seat. Moreover, reconciling multiple performance perspectives is by definition indeterminate, presenting a stumbling block to those in the pursuit of rounded evaluation of Recommender Systems. EvalRS 2022 -- a data challenge designed around Multi-Objective Evaluation -- was a first practical endeavour, providing many insights into the requirements and challenges of balancing multiple objectives in evaluation. In this work, we reflect on EvalRS 2022 and expound upon crucial learnings to formulate a first-principles approach toward Multi-Objective model selection, and outline a set of guidelines for carrying out a Multi-Objective Evaluation challenge, with potential applicability to the problem of rounded evaluation of competing models in real-world deployments.
翻译:如今,推荐系统主要在准确性方面进行评估,其他方面,如多样性、长期用户保留和公平性等因素通常退居次要地位。此外,和解多个绩效指标在定义上是不确定的,这是那些追求全面评估推荐系统的人面临的障碍。EvalRS 2022是环绕多目标评估设计的数据挑战的首个实际尝试,为平衡评估中的多个绩效视角提供了许多见解。在这项工作中,我们反思了 EvalRS 2022,并阐述了重要的学习成果,提出了一种首要原则方法来进行多目标模型选择,并概述了一组指南,以进行多目标评估挑战,并有可能适用于真实部署中竞争模型的全面评估问题。