Modern recommender systems face an increasing need to explain their recommendations. Despite considerable progress in this area, evaluating the quality of explanations remains a significant challenge for researchers and practitioners. Prior work mainly conducts human study to evaluate explanation quality, which is usually expensive, time-consuming, and prone to human bias. In this paper, we propose an offline evaluation method that can be computed without human involvement. To evaluate an explanation, our method quantifies its counterfactual impact on the recommendation. To validate the effectiveness of our method, we carry out an online user study. We show that, compared to conventional methods, our method can produce evaluation scores more correlated with the real human judgments, and therefore can serve as a better proxy for human evaluation. In addition, we show that explanations with high evaluation scores are considered better by humans. Our findings highlight the promising direction of using the counterfactual approach as one possible way to evaluate recommendation explanations.
翻译:现代推荐人系统面临越来越多的解释建议的必要性。尽管在这一领域取得了相当大的进展,但评估解释质量仍然是研究人员和从业人员面临的重大挑战。先前的工作主要是进行人类研究,以评价解释质量,而解释质量通常费用昂贵、耗时且容易产生人类偏见。我们在本文件中建议采用离线评价方法,在没有人类参与的情况下可以计算;为了评估解释,我们的方法量化了它对建议产生的反事实影响。为了验证我们的方法的有效性,我们进行了在线用户研究。我们表明,与传统方法相比,我们的方法可以产生与实际人类判断更相关的评价分数,因此可以更好地作为人类评价的替代物。此外,我们显示,高评价分的解释被人类认为是更好的。我们的调查结果强调了使用反事实方法作为评价建议解释的一种可能方式的有希望的方向。