For machine learning models to be most useful in numerous sociotechnical systems, many have argued that they must be human-interpretable. However, despite increasing interest in interpretability, there remains no firm consensus on how to measure it. This is especially true in representation learning, where interpretability research has focused on "disentanglement" measures only applicable to synthetic datasets and not grounded in human factors. We introduce a task to quantify the human-interpretability of generative model representations, where users interactively modify representations to reconstruct target instances. On synthetic datasets, we find performance on this task much more reliably differentiates entangled and disentangled models than baseline approaches. On a real dataset, we find it differentiates between representation learning methods widely believed but never shown to produce more or less interpretable models. In both cases, we ran small-scale think-aloud studies and large-scale experiments on Amazon Mechanical Turk to confirm that our qualitative and quantitative results agreed.
翻译:为使机器学习模型在许多社会技术系统中最有用,许多人认为,机器学习模型必须是人类解释的。然而,尽管人们对可解释性的兴趣日益浓厚,但对于如何衡量它仍没有坚定的共识。在代表性学习中尤其如此。在代表性学习中,可解释性研究侧重于“分解”措施,仅适用于合成数据集,不以人的因素为基础。我们引入了一项任务,以量化可解释性模型的人类解释性,用户交互修改表达方式以重建目标实例。在合成数据集中,我们发现这项任务的绩效比基线方法更可靠地区别了缠绕和分解的模型。在真实的数据集中,我们发现它区分了广泛相信但从未显示产生多少或更少可解释模型的代表性学习方法。在这两种情况下,我们在亚马逊机械土耳其进行了小规模的智多式研究和大规模实验,以证实我们的质量和数量结果是一致的。