Recently, research on explainable recommender systems has drawn much attention from both academia and industry, resulting in a variety of explainable models. As a consequence, their evaluation approaches vary from model to model, which makes it quite difficult to compare the explainability of different models. To achieve a standard way of evaluating recommendation explanations, we provide three benchmark datasets for EXplanaTion RAnking (denoted as EXTRA), on which explainability can be measured by ranking-oriented metrics. Constructing such datasets, however, poses great challenges. First, user-item-explanation triplet interactions are rare in existing recommender systems, so how to find alternatives becomes a challenge. Our solution is to identify nearly identical sentences from user reviews. This idea then leads to the second challenge, i.e., how to efficiently categorize the sentences in a dataset into different groups, since it has quadratic runtime complexity to estimate the similarity between any two sentences. To mitigate this issue, we provide a more efficient method based on Locality Sensitive Hashing (LSH) that can detect near-duplicates in sub-linear time for a given query. Moreover, we make our code publicly available to allow researchers in the community to create their own datasets.
翻译:最近,关于可解释的建议系统的研究引起了学术界和工业界的极大关注,从而产生了各种可解释的模式。因此,它们的评价方法因模式而异,因此很难比较不同模式的解释性。为了实现评价建议解释的标准方法,我们为Explanation RAnking(称为EXTRA)提供了三个基准数据集,可依据分级制衡量其可解释性。然而,建立这种数据集带来了巨大的挑战。首先,用户-项目-勘探三重互动在现有建议系统中是少见的,因此如何找到替代方法成为挑战。我们的解决办法是从用户审查中找出几乎相同的句子。然后,这个想法导致第二个挑战,即如何有效地将一个数据集的句子分为不同的组,因为对于任何两句子的相似性,其解释性都具有四重时间的复杂度。然而,为了减轻这一问题,我们提供了一种效率更高的方法,即基于本地敏感气体(LSH)的方法,可以在子线内探测近乎重复的词句子句子句子,从而发现与用户审查几乎相同的句子句子。这个想法又导致我们自己的研究人员可以公开查询。