Feature attributions are a commonly used explanation type, when we want to posthoc explain the prediction of a trained model. Yet, they are not very well explored in IR. Importantly, feature attribution has rarely been rigorously defined, beyond attributing the most important feature the highest value. What it means for a feature to be more important than others is often left vague. Consequently, most approaches focus on just selecting the most important features and under utilize or even ignore the relative importance within features. In this work, we rigorously define the notion of feature attribution for ranking models, and list essential properties that a valid attribution should have. We then propose RankingSHAP as a concrete instantiation of a list-wise ranking attribution method. Contrary to current explanation evaluation schemes that focus on selections, we propose two novel evaluation paradigms for evaluating attributions over learning-to-rank models. We evaluate RankingSHAP for commonly used learning-to-rank datasets to showcase the more nuanced use of an attribution method while highlighting the limitations of selection-based explanations. In a simulated experiment we design an interpretable model to demonstrate how list-wise ranking attributes can be used to investigate model decisions and evaluate the explanations qualitatively. Because of the contrastive nature of the ranking task, our understanding of ranking model decisions can substantially benefit from feature attribution explanations like RankingSHAP.
翻译:暂无翻译