Concept discovery is one of the open problems in the interpretability literature that is important for bridging the gap between non-deep learning experts and model end-users. Among current formulations, concepts defines them by as a direction in a learned representation space. This definition makes it possible to evaluate whether a particular concept significantly influences classification decisions for classes of interest. However, finding relevant concepts is tedious, as representation spaces are high-dimensional and hard to navigate. Current approaches include hand-crafting concept datasets and then converting them to latent space directions; alternatively, the process can be automated by clustering the latent space. In this study, we offer another two approaches to guide user discovery of meaningful concepts, one based on multiple hypothesis testing, and another on interactive visualization. We explore the potential value and limitations of these approaches through simulation experiments and an demo visual interface to real data. Overall, we find that these techniques offer a promising strategy for discovering relevant concepts in settings where users do not have predefined descriptions of them, but without completely automating the process.
翻译:概念发现是解释性文献中对于弥合非深入学习专家与模型终端用户之间的差距十分重要的公开问题之一。 在目前的表述中,概念通过在学习的演示空间中的方向来界定这些概念。这一定义使得有可能评估特定概念是否对利益类别分类决定产生重大影响。然而,发现相关概念是乏味的,因为代表空间是高维和难于浏览的。目前的方法包括手工制作概念数据集,然后将其转换为隐蔽的空间方向;或者,通过将潜藏空间组合起来,这一过程可以自动化。在本研究中,我们提供了另外两种方法来指导用户发现有意义的概念,一种基于多个假设测试,另一种基于互动可视化。我们通过模拟实验和真实数据的直观界面,探索这些方法的潜在价值和局限性。总体而言,我们发现这些技术为在用户没有预先定义的概念描述但并不完全自动化的情况下发现相关概念提供了有希望的战略。