Interest in understanding and factorizing learned embedding spaces through conceptual explanations is steadily growing. When no human concept labels are available, concept discovery methods search trained embedding spaces for interpretable concepts like object shape or color that can be used to provide post-hoc explanations for decisions. Unlike previous work, we argue that concept discovery should be identifiable, meaning that a number of known concepts can be provably recovered to guarantee reliability of the explanations. As a starting point, we explicitly make the connection between concept discovery and classical methods like Principal Component Analysis and Independent Component Analysis by showing that they can recover independent concepts with non-Gaussian distributions. For dependent concepts, we propose two novel approaches that exploit functional compositionality properties of image-generating processes. Our provably identifiable concept discovery methods substantially outperform competitors on a battery of experiments including hundreds of trained models and dependent concepts, where they exhibit up to 29 % better alignment with the ground truth. Our results provide a rigorous foundation for reliable concept discovery without human labels.
翻译:在没有人类概念标签的情况下,概念发现方法搜索经过训练的嵌入空间,用于解释可以用来为决定提供后热力解释的物体形状或颜色等可解释的概念。与以往的工作不同,我们主张,概念发现应该可以识别,这意味着可以找到一些已知的概念,以保证解释的可靠性。作为起点,我们明确将概念发现与传统方法(如主要组成部分分析和独立组成部分分析)联系起来,表明它们能够从非高加索分布中恢复独立概念。关于依赖性概念,我们提出了两种新颖方法,利用图像生成过程的功能构成特性。我们可辨别可识别的概念发现方法大大超越了包括数百个经过训练的模型和依赖性概念在内的实验中的竞争者,这些模型和依赖性概念与地面真相的可靠性更高达29%。我们的结果为没有人类标签的可靠概念发现提供了坚实的基础。