Variational AutoEncoders (VAE) employ deep learning models to learn a continuous latent z-space that is subjacent to a high-dimensional observed dataset. With that, many tasks are made possible, including face reconstruction and face synthesis. In this work, we investigated how face masks can help the training of VAEs for face reconstruction, by restricting the learning to the pixels selected by the face mask. An evaluation of the proposal using the celebA dataset shows that the reconstructed images are enhanced with the face masks, especially when SSIM loss is used either with l1 or l2 loss functions. We noticed that the inclusion of a decoder for face mask prediction in the architecture affected the performance for l1 or l2 loss functions, while this was not the case for the SSIM loss. Besides, SSIM perceptual loss yielded the crispest samples between all hypotheses tested, although it shifts the original color of the image, making the usage of the l1 or l2 losses together with SSIM helpful to solve this issue.
翻译:变化式自动编码器(VAE)使用深层学习模型来学习连续潜伏的z-空间,这种空间在高维观测数据集中处于次要位置。 有了这些模型,许多任务就有可能完成, 包括面部重建和面部合成。 在这项工作中,我们调查了面罩如何有助于培训VAEs进行面部重建, 将学习限制在面罩所选择的像素上。 利用电磁盘数据集对建议进行的评估表明, 面罩强化了重建后的图像, 特别是当SSIM损失与l1或l2损失功能一起使用 SSIM 损失时。 我们注意到, 在建筑中添加面罩预测会影响 l1 或l2 损失功能的性能, 而对于 SSIM 损失则不是这种情况。 此外, SSIM 感知性损失在所测试的所有假体之间产生了最精确的样本, 尽管它改变了图像的原始颜色, 使 l1 或 l2 损失与 SSIM 一起用于解决这个问题。