The training of Generative Adversarial Networks (GANs) requires a large amount of data, stimulating the development of new data augmentation methods to alleviate the challenge. Oftentimes, these methods either fail to produce enough new data or expand the dataset beyond the original knowledge domain. In this paper, we propose a new way of representing the available knowledge in the manifold of data barycenters. Such a representation allows performing data augmentation based on interpolation between the nearest data elements using Wasserstein distance. The proposed method finds cliques in the nearest-neighbors graph and, at each sampling iteration, randomly draws one clique to compute the Wasserstein barycenter with random uniform weights. These barycenters then become the new natural-looking elements that one could add to the dataset. We apply this approach to the problem of landmarks detection and augment the available landmarks data within the dataset. Additionally, the idea is validated on cardiac data for the task of medical segmentation. Our approach reduces the overfitting and improves the quality metrics both beyond the original data outcome and beyond the result obtained with classical augmentation methods.
翻译:培训General Aversarial Networks(GANs)需要大量的数据,刺激开发新的数据增强方法来减轻挑战。通常,这些方法要么未能产生足够的新数据,要么将数据集扩大到原始知识领域之外。在本文中,我们建议了一种新的方法,在数据中显示现有知识的方方面面。这种表示方式允许根据使用瓦塞斯坦距离的最近数据元素之间的内推法进行数据增强工作。建议的方法在靠近邻居的图表中找到晶体,在每次抽样复制中随机抽出一种晶体,用随机统一重量计算瓦塞斯泰因采样中心。这些采样器随后成为新的自然外观元素,可以添加到数据集中。我们用这种方法处理里程碑探测问题,并增加数据集中现有的里程碑数据。此外,为进行医疗分解任务,对心脏数据进行了验证。我们的方法减少了超出原始数据结果和通过古典增殖法获得的结果之外的质量计量器。