Variational Autoencoders (VAEs) have been shown to be remarkably effective in recovering model latent spaces for several computer vision tasks. However, currently trained VAEs, for a number of reasons, seem to fall short in learning invariant and equivariant clusters in latent space. Our work focuses on providing solutions to this problem and presents an approach to disentangle equivariance feature maps in a Lie group manifold by enforcing deep, group-invariant learning. Simultaneously implementing a novel separation of semantic and equivariant variables of the latent space representation, we formulate a modified Evidence Lower BOund (ELBO) by using a mixture model pdf like Gaussian mixtures for invariant cluster embeddings that allows superior unsupervised variational clustering. Our experiments show that this model effectively learns to disentangle the invariant and equivariant representations with significant improvements in the learning rate and an observably superior image recognition and canonical state reconstruction compared to the currently best deep learning models.
翻译:事实证明,在为计算机的一些视觉任务恢复模型潜伏空间方面,变化式自动编码器(VAE)非常有效。然而,目前经过培训的 VAEs由于若干原因,似乎在潜在空间的变异和异变群群中学习方面做得不够。我们的工作重点是提供解决这个问题的办法,并提出一种方法,通过实施深层次的、集体的、异质的学习,将利族群群中的异差特征图解开来。同时对潜在空间代表的语义和异变变量进行新颖的分离,我们通过使用混合模型(pdf),例如高斯族混合物,用于允许优异变群集的内嵌入,使优异群得以不受控制的变异群。我们的实验表明,这一模型有效地学会了将异变和异性图解开,极大地提高了学习率,并且与目前最好的深层学习模型相比,图像的识别和可视优度国家重建。