Imagine you are a teacher attempting to assess a student's level in a particular subject. If you design a test with only hard questions, and the student fails, this mostly proves that the student does not understand the more advanced material. A more insightful exam would include different types of questions varying in difficulty to truly understand the student's weaknesses and strengths from different perspectives. In the field of Recommender Systems (RS), more often than not, we design evaluations to measure an algorithm's ability to optimize goals in complex scenarios, representative of the real-world challenges the system would most probably face. Nevertheless, this paper posits that testing an algorithm's ability to address both simple and complex tasks/problems would offer a more detailed view of performance to help identify, at a more granular level, the weaknesses and strengths of solutions when facing different scenarios/domains. We believe the RS community would greatly benefit from creating a collection of standardized, simple, and targeted experiments, which, much like a suite of "unit tests", would individually assess an algorithm's ability to tackle core challenges that make up complex RS tasks. What's more, these experiments go beyond traditional pass/fail "unit tests". Running an algorithm against the collection of experiments allows a researcher to empirically analyze in which type of settings an algorithm performs best and to what degree under different metrics. Not only do we defend this position, in this paper, we also offer a proposal of how these simple and targeted experiments could be defined and shared and suggest potential next steps to make this project a reality.
翻译:想象一下您是一位试图在某个特定科目中评估学生水平的教师。 如果您设计一个只用棘手问题、而学生失败的测试, 这大多证明学生不理解更先进的材料。 更深入的测试将包括不同类型的问题, 难以真正理解学生的弱点和长处, 从不同的角度来理解学生的弱点和长处。 在建议系统(RS)领域,我们常常设计评估算法在复杂情景中优化目标的能力, 代表现实世界的系统最有可能面对的挑战。 尽管如此, 本文指出, 测试算法处理简单和复杂任务/ 问题的能力, 将提供一个更详尽的业绩观, 帮助在更粗略的层面上, 确定学生的弱点和长处, 真正了解学生的弱点和长处。 我们认为, 建立标准化、 简单和有针对性的实验, 这就像一套“ 单位测试 ”, 将个别地评估算法处理复杂RS任务的核心挑战的能力。 然而, 这些实验更深入地显示, 超越了简单和复杂的任务/ 复杂的任务/ 问题, 这些实验的进度, 将使得在传统的 目标化的实验中, 能够进行一个最精确的实验 的实验, 进行一个不同的实验, 样的模型的实验, 能够进行一个不同的实验, 进行一个不同的实验。