We present a deep metric variational autoencoder for multi-modal data generation. The variational autoencoder employs triplet loss in the latent space, which allows for conditional data generation by sampling in the latent space within each class cluster. The approach is evaluated on a multi-modal dataset consisting of otoscopy images of the tympanic membrane with corresponding wideband tympanometry measurements. The modalities in this dataset are correlated, as they represent different aspects of the state of the middle ear, but they do not present a direct pixel-to-pixel correlation. The approach shows promising results for the conditional generation of pairs of images and tympanograms, and will allow for efficient data augmentation of data from multi-modal sources.
翻译:我们为多模式数据生成提供了一种深度可变自动编码器。变式自动编码器在潜在空间中采用了三重损失,允许通过对每类组内的潜在空间进行取样来生成有条件数据。该方法在多模式数据集上进行了评估,该数据集由多帕氏膜的透镜图像和相应的宽带音量测量组成。该数据集的模式是相互关联的,因为它们代表着中耳状态的不同方面,但并不具有直接的像素到像素的关联性。该方法显示了有条件生成成对图像和音频图的有希望的结果,并将使得多模式来源的数据能够高效地增加数据。