While much efforts have been focused on improving Variational Autoencoders through richer posterior and prior distributions, little interest was shown in amending the way we generate the data. In this paper, we develop two non \emph{prior-dependent} generation procedures based on the geometry of the latent space seen as a Riemannian manifold. The first one consists in sampling along geodesic paths which is a natural way to explore the latent space while the second one consists in sampling from the inverse of the metric volume element which is easier to use in practice. Both methods are then compared to \emph{prior-based} methods on various data sets and appear well suited for a limited data regime. Finally, the latter method is used to perform data augmentation in a small sample size setting and is validated across various standard and \emph{real-life} data sets. In particular, this scheme allows to greatly improve classification results on the OASIS database where balanced accuracy jumps from 80.7% for a classifier trained with the raw data to 89.1% when trained only with the synthetic data generated by our method. Such results were also observed on 4 standard data sets.
翻译:虽然许多努力都集中在通过更富的后部和先前的分布改善变化式自动编码器上,但在修改我们生成数据的方式方面没有表现出多大兴趣。 在本文中,我们开发了两种非emph{prior-依赖}生成程序,其基础是被视为Riemannian 方块的潜层空间的几何学。第一种方法是按测深路径进行取样,这是探索潜在空间的一种自然方式,而第二种方法则是从比较容易实际使用的量度元素反向取样。这两种方法随后都与各种数据集的emph{prior-broad}方法进行比较,并似乎非常适合有限的数据系统。最后,后一种方法用于在小样本尺寸设置中进行数据增强,并经过各种标准和 emph{real- life-lif} 数据集的验证。 特别是,这一方法能够大大改进OASIS数据库的分类结果, 后者的平衡精度从80.7%上升到原始数据培训的80.1%,但仅经过我们的方法产生的合成数据培训后,这种结果也只观察到了4个标准数据集。