In this paper, we provide the first focused study on the discontinuities (aka. holes) in the latent space of Variational Auto-Encoders (VAEs), a phenomenon which has been shown to have a detrimental effect on model capacity. When investigating latent holes, existing works are exclusively centred around the encoder network and they merely explore the existence of holes. We tackle these limitations by proposing a highly efficient Tree-based Decoder-Centric (TDC) algorithm for latent hole identification, with a focal point on the text domain. In contrast to past studies, our approach pays attention to the decoder network, as a decoder has a direct impact on the model's output quality. Furthermore, we provide, for the first time, in-depth empirical analysis of the latent hole phenomenon, investigating several important aspects such as how the holes impact VAE algorithms' performance on text generation, and how the holes are distributed in the latent space.
翻译:在本文中,我们首次重点研究了变形自动电解码器(VAE)潜在空间的不连续性(aka.洞)问题,这一现象已经证明对模型能力有不利影响。在调查潜在洞时,现有工作完全围绕编码器网络进行,它们只是探索洞的存在。我们通过提出一种高效的树基解码器(TDC)潜洞识别算法(TDC)来解决这些局限性,该算法在文本域上有一个联络点。与以往的研究不同,我们的方法关注解码器网络,因为解码器对模型的产出质量有直接影响。此外,我们第一次对潜在洞现象进行了深入的经验分析,调查了几个重要方面,例如,洞如何影响VAE算法在文本生成方面的性能,以及洞是如何在潜在空间中分布的。