We propose a composable framework for latent space image augmentation that allows for easy combination of multiple augmentations. Image augmentation has been shown to be an effective technique for improving the performance of a wide variety of image classification and generation tasks. Our framework is based on the Variational Autoencoder architecture and uses a novel approach for augmentation via linear transformation within the latent space itself. We explore losses and augmentation latent geometry to enforce the transformations to be composable and involuntary, thus allowing the transformations to be readily combined or inverted. Finally, we show these properties are better performing with certain pairs of augmentations, but we can transfer the latent space to other sets of augmentations to modify performance, effectively constraining the VAE's bottleneck to preserve the variance of specific augmentations and features of the image which we care about. We demonstrate the effectiveness of our approach with initial results on the MNIST dataset against both a standard VAE and a Conditional VAE. This latent augmentation method allows for much greater control and geometric interpretability of the latent space, making it a valuable tool for researchers and practitioners in the field.
翻译:我们为潜在的空间图像增强提议了一个可建构的框架,以便容易地结合多种增强。 图像增强已证明是改进各种图像分类和生成任务的一种有效技术。 我们的框架以变形自动编码结构为基础,并采用新颖的方法在潜层内通过线性变换来增强。 我们探索损失和增增殖潜伏几何方法,以强制进行可成份和非自愿的变换,从而使变异能够容易地合并或反转。 最后, 我们显示这些变异与某些加增组合相比效果更好, 但我们可以将潜在空间转移到其他增强组合, 以改变性能, 有效地限制 VAE 的瓶盖, 以保持我们所关心的图像的具体增强和特性的差异。 我们展示了我们在MNIST数据集上对标准VAE 和 调控VAE 的初步结果的有效性。 这种潜伏增强方法使得潜在空间的控制和几何解释性更强, 使它成为实地研究人员和从业人员的宝贵工具。</s>