Interest in understanding and factorizing learned embedding spaces is growing. For instance, recent concept-based explanation techniques analyze a machine learning model in terms of interpretable latent components. Such components have to be discovered in the model's embedding space, e.g., through independent component analysis (ICA) or modern disentanglement learning techniques. While these unsupervised approaches offer a sound formal framework, they either require access to a data generating function or impose rigid assumptions on the data distribution, such as independence of components, that are often violated in practice. In this work, we link conceptual explainability for vision models with disentanglement learning and ICA. This enables us to provide first theoretical results on how components can be identified without requiring any distributional assumptions. From these insights, we derive the disjoint attributions (DA) concept discovery method that is applicable to a broader class of problems than current approaches but yet possesses a formal identifiability guarantee. In an extensive comparison against component analysis and over 300 state-of-the-art disentanglement models, DA stably maintains superior performance, even under varying distributions and correlation strengths.
翻译:例如,最近基于概念的解释技术从可解释的潜在组成部分的角度对机器学习模型进行分析,这些组成部分必须在模型的嵌入空间中发现,例如,通过独立组成部分分析(ICA)或现代分解学习技术。这些未经监督的方法提供了一个健全的正式框架,但它们要么需要获得数据生成功能,要么对数据分布施加僵硬的假设,例如部件的独立性,而在实践中经常被违反。在这项工作中,我们把视觉模型的概念解释与分解学习和ICA联系起来。这使我们能够提供初步的理论结果,说明如何在不要求任何分配假设的情况下确定部件。我们从这些洞察中得出适用于比目前方法更广泛的问题类别、但又拥有正式识别保证的脱节概念发现方法。在与部件分析和300多个最先进的分解模型进行广泛比较时,DA刺伤性地保持了优异的性业绩,即使分布和相关性各不相同。