Explainable Information Retrieval (XIR) is a growing research area focused on enhancing transparency and trustworthiness of the complex decision-making processes taking place in modern information retrieval systems. While there has been progress in developing XIR systems, empirical evaluation tools to assess the degree of explainability attained by such systems are lacking. To close this gap and gain insights into the true merit of XIR systems, we extend existing insights from a factor analysis of search explainability to introduce SSE (Search System Explainability), an evaluation metric for XIR search systems. Through a crowdsourced user study, we demonstrate SSE's ability to distinguish between explainable and non-explainable systems, showing that systems with higher scores indeed indicate greater interpretability. Additionally, we observe comparable perceived temporal demand and performance levels between non-native and native English speakers. We hope that aside from these concrete contributions to XIR, this line of work will serve as a blueprint for similar explainability evaluation efforts in other domains of machine learning and natural language processing.
翻译:暂无翻译