Discrete latent spaces in variational autoencoders have been shown to effectively capture the data distribution for many real-world problems such as natural language understanding, human intent prediction, and visual scene representation. However, discrete latent spaces need to be sufficiently large to capture the complexities of real-world data, rendering downstream tasks computationally challenging. For instance, performing motion planning in a high-dimensional latent representation of the environment could be intractable. We consider the problem of sparsifying the discrete latent space of a trained conditional variational autoencoder, while preserving its learned multimodality. As a post hoc latent space reduction technique, we use evidential theory to identify the latent classes that receive direct evidence from a particular input condition and filter out those that do not. Experiments on diverse tasks, such as image generation and human behavior prediction, demonstrate the effectiveness of our proposed technique at reducing the discrete latent sample space size of a model while maintaining its learned multimodality.
翻译:事实证明,变异自动编码器的分解潜伏空间能够有效捕捉许多现实世界问题的数据分布,如自然语言理解、人类意图预测和视觉场景演示等;然而,离散潜伏空间需要足够大,足以捕捉真实世界数据的复杂性,从而在计算上使下游任务具有挑战性;例如,在高维潜在环境代表中进行运动规划可能难以解决。我们认为,在保存其所学的多式技术的同时,将经过训练的有条件变异自动编码器的离散潜伏空间加以环绕开的问题。作为后一种特殊潜伏空间缩减技术,我们使用证据理论来确定从特定输入条件直接获得证据的潜在类别,并过滤那些不直接从特定输入条件获取证据的潜在类别。关于不同任务的实验,例如图像生成和人类行为预测,显示了我们拟议技术在减少模型的离散潜在样本空间大小的同时保持其所学的多式性的有效性。