Concept-driven counterfactuals explain decisions of classifiers by altering the model predictions through semantic changes. In this paper, we present a novel approach that leverages cross-modal decompositionality and image-specific concepts to create counterfactual scenarios expressed in natural language. We apply the proposed interpretability framework, termed Decompose and Explain (DeX), to the challenging domain of image privacy decisions, which are contextual and subjective. This application enables the quantification of the differential contributions of key scene elements to the model prediction. We identify relevant decision factors via a multi-criterion selection mechanism that considers both image similarity for minimal perturbations and decision confidence to prioritize impactful changes. This approach evaluates and compares diverse explanations, and assesses the interdependency and mutual influence among explanatory properties. By leveraging image-specific concepts, DeX generates image-grounded, sparse explanations, yielding significant improvements over the state of the art. Importantly, DeX operates as a training-free framework, offering high flexibility. Results show that DeX not only uncovers the principal contributing factors influencing subjective decisions, but also identifies underlying dataset biases allowing for targeted mitigation strategies to improve fairness.
翻译:概念驱动的反事实解释通过语义层面的改变来调整模型预测,从而解释分类器的决策过程。本文提出一种创新方法,利用跨模态可分解性和图像特定概念,构建以自然语言表达的反事实场景。我们将这一可解释性框架(称为Decompose and Explain,DeX)应用于具有情境依赖性和主观性的图像隐私决策这一挑战性领域。该应用能够量化关键场景元素对模型预测的差异化贡献。我们通过多准则选择机制识别相关决策因素,该机制同时考虑图像相似性(以实现最小扰动)和决策置信度(以优先处理影响显著的改变)。该方法能够评估和比较多种解释,并分析解释属性间的相互依赖关系与影响机制。通过利用图像特定概念,DeX生成基于图像的稀疏解释,相比现有技术取得显著改进。重要的是,DeX作为免训练框架运行,具有高度灵活性。实验结果表明,DeX不仅能揭示影响主观决策的主要贡献因素,还能识别潜在的数据集偏差,从而为实施针对性缓解策略以提升公平性提供依据。