Instance-based interpretation methods have been widely studied for supervised learning methods as they help explain how black box neural networks predict. However, instance-based interpretations remain ill-understood in the context of unsupervised learning. In this paper, we investigate influence functions [20], a popular instance-based interpretation method, for a class of deep generative models called variational auto-encoders (VAE). We formally frame the counter-factual question answered by influence functions in this setting, and through theoretical analysis, examine what they reveal about the impact of training samples on classical unsupervised learning methods. We then introduce VAE-TracIn, a computationally efficient and theoretically sound solution based on Pruthi et al. [28], for VAEs. Finally, we evaluate VAE-TracIn on several real world datasets with extensive quantitative and qualitative analysis.
翻译:以实例为基础的解释方法已被广泛研究,用于指导性学习方法,因为这些方法有助于解释黑盒神经网络如何预测黑盒神经网络;然而,在未经监督的学习背景下,基于实例的解释仍然不为人们所理解;在本文中,我们调查了影响函数[20],这是一种以实例为基础的普及解释方法,用于称为变异自动生成器(VAE)的一类深层基因模型。我们正式界定了由这一环境中的影响功能所解答的反事实问题,并通过理论分析,研究它们揭示的培训样本对古典未经监督的学习方法的影响。我们随后为VAE引入了VAE-TracIn,一种基于Pruthi等人[28]的计算高效和理论上合理的解决方案。最后,我们用大量定量和定性分析对VAE-TRacIn进行若干真实的世界数据集进行了评估。