While the interpretability of machine learning models is often equated with their mere syntactic comprehensibility, we think that interpretability goes beyond that, and that human interpretability should also be investigated from the point of view of cognitive science. The goal of this paper is to discuss to what extent cognitive biases may affect human understanding of interpretable machine learning models, in particular of logical rules discovered from data. Twenty cognitive biases are covered, as are possible debiasing techniques that can be adopted by designers of machine learning algorithms and software. Our review transfers results obtained in cognitive psychology to the domain of machine learning, aiming to bridge the current gap between these two areas. It needs to be followed by empirical studies specifically focused on the machine learning domain.
翻译:虽然机器学习模型的可解释性往往等同于其简单的综合理解性,但我们认为,解释性不仅限于此,还应从认知科学的角度对人的可解释性进行调查。本文件的目的是讨论认知偏见在多大程度上影响人类对可解释的机器学习模型的理解,特别是从数据中发现的逻辑规则。20种认知偏见被涵盖在内,以及机器学习算法和软件设计师可能采用的贬低性技术。我们的审查将认知心理学成果转移到机器学习领域,目的是弥合这两个领域之间的现有差距。随后需要专门侧重于机器学习领域的实证研究。