The success of deep neural nets heavily relies on their ability to encode complex relations between their input and their output. While this property serves to fit the training data well, it also obscures the mechanism that drives prediction. This study aims to reveal hidden concepts by employing an intervention mechanism that shifts the predicted class based on discrete variational autoencoders. An explanatory model then visualizes the encoded information from any hidden layer and its corresponding intervened representation. By the assessment of differences between the original representation and the intervened representation, one can determine the concepts that can alter the class, hence providing interpretability. We demonstrate the effectiveness of our approach on CelebA, where we show various visualizations for bias in the data and suggest different interventions to reveal and change bias.
翻译:深层神经网的成功在很大程度上取决于它们将输入和输出之间的复杂关系编码的能力。 虽然这一属性有助于很好地适应培训数据,但它也模糊了推动预测的机制。 这项研究的目的是通过使用一种干预机制揭示隐蔽的概念,这种干预机制将预测的舱位根据离散的变异自动代言人进行改变。 一种解释性模型然后将编码信息从任何隐藏层及其相应的干预代表中直观地呈现出来。 通过评估原始代表与干预代表之间的差异,人们可以确定能够改变舱位的概念,从而提供解释性。 我们展示了我们在西里巴问题上的做法的有效性,我们在那里展示了数据偏差的各种可视化,并提出不同干预措施以揭示和改变偏差。