The ability of Variational Autoencoders to learn disentangled representations has made them appealing for practical applications. However, their mean representations, which are generally used for downstream tasks, have recently been shown to be more correlated than their sampled counterpart, on which disentanglement is usually measured. In this paper, we refine this observation through the lens of selective posterior collapse, which states that only a subset of the learned representations, the active variables, is encoding useful information while the rest (the passive variables) is discarded. We first extend the existing definition to multiple data examples and show that active variables are equally disentangled in mean and sampled representations. Based on this extension and the pre-trained models from disentanglement lib, we then isolate the passive variables and show that they are responsible for the discrepancies between mean and sampled representations. Specifically, passive variables exhibit high correlation scores with other variables in mean representations while being fully uncorrelated in sampled ones. We thus conclude that despite what their higher correlation might suggest, mean representations are still good candidates for downstream tasks applications. However, it may be beneficial to remove their passive variables, especially when used with models sensitive to correlated features.
翻译:变化式自动编码器了解分解的表达方式的能力使得它们吸引了实际应用。然而,它们通常用于下游任务的平均值表示方式最近显示比抽样对应方更具有关联性,通常对之进行分解。在本文中,我们通过选择性后方表达式的镜像来改进这一观察,指出只有一部分已学习的表达式,即主动变量,是编码有用的信息,而其余的(被动变量)则被抛弃。我们首先将现有定义扩大到多个数据实例,并表明活动变量在平均和抽样的表达式中同样被分解。根据这一扩展式和预先训练的模型,我们随后将被动变量分离出来,并表明它们应对平均和抽样表达式之间的差异负责。具体地说,被动变量与其他平均表达式变量的相对性得分数非常高,而在抽样中则完全不相干。我们因此得出结论,尽管它们的相对性可能表明它们较高的关联性,但表示仍然是下游任务应用的良好选择者。但是,根据这一扩展式和预先训练过的模型,特别是当使用敏感的模型时,可能有利于删除其被动变量。