Information retrieval (IR) systems have become an integral part of our everyday lives. As search engines, recommender systems, and conversational agents are employed across various domains from recreational search to clinical decision support, there is an increasing need for transparent and explainable systems to guarantee accountable, fair, and unbiased results. Despite many recent advances towards explainable AI and IR techniques, there is no consensus on what it means for a system to be explainable. Although a growing body of literature suggests that explainability is comprised of multiple subfactors, virtually all existing approaches treat it as a singular notion. In this paper, we examine explainability in Web search systems, leveraging psychometrics and crowdsourcing to identify human-centered factors of explainability. Based on these factors, we establish a continuous-scale evaluation instrument for explainable search systems that allows researchers and practitioners to trade-off performance in a more flexible manner than what was previously possible.
翻译:信息检索系统已成为我们日常生活的一个组成部分。 随着搜索引擎、建议系统以及对话代理机构被从娱乐搜索到临床决策支持等各个领域所利用,越来越需要透明和可解释的系统来保证问责、公正和不偏不倚的结果。尽管最近朝着可解释的人工智能和IR技术取得了许多进展,但对于一个系统的含义如何可以解释却没有共识。尽管越来越多的文献表明,解释由多个子要素组成,实际上所有现有的方法都把它当作一个单一的概念。在本文中,我们研究了网络搜索系统中的解释性,利用心理计量和众包来查明以人为本的可解释性因素。基于这些因素,我们为可解释的搜索系统建立了一个持续规模的评价工具,使研究人员和从业人员能够以比以前可能的方式更灵活地交换业绩。