One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.
翻译:人们注意到的病媒定量变异自动coder(VQ-VAE)问题是,所学的离散式自动coder(SQ-VAE)只使用了代码簿全部容量的一小部分,也称为代码簿的崩溃。我们假设VQ-VAE的培训计划,涉及一些精心设计的超自然现象,是这一问题的基础。在本文中,我们提出了一个新的培训计划,通过新颖的蒸馏分解和定量来扩展标准VAE,称为Stochastical定量变异式自动coder(SQ-VAE)。在 SQ-VAE,我们观察到一种趋势,即量化在培训的初始阶段是随机的,但逐渐趋向于确定性量化,我们称之为自我抵消。我们的实验显示,SQ-VAE在与视觉和语言有关的任务中,SQ-VAE优于VAE和VQ-VAE。