In this paper, we propose a new method to perform data augmentation in a reliable way in the High Dimensional Low Sample Size (HDLSS) setting using a geometry-based variational autoencoder. Our approach combines a proper latent space modeling of the VAE seen as a Riemannian manifold with a new generation scheme which produces more meaningful samples especially in the context of small data sets. The proposed method is tested through a wide experimental study where its robustness to data sets, classifiers and training samples size is stressed. It is also validated on a medical imaging classification task on the challenging ADNI database where a small number of 3D brain MRIs are considered and augmented using the proposed VAE framework. In each case, the proposed method allows for a significant and reliable gain in the classification metrics. For instance, balanced accuracy jumps from 66.3% to 74.3% for a state-of-the-art CNN classifier trained with 50 MRIs of cognitively normal (CN) and 50 Alzheimer disease (AD) patients and from 77.7% to 86.3% when trained with 243 CN and 210 AD while improving greatly sensitivity and specificity metrics.
翻译:在本文中,我们提出一种新的方法,用基于几何的变异自动编码器可靠地在高度低采样规模(HDLSS)设置中进行数据增强,使用基于几何的变异自动编码器。我们的方法结合了将VAE视为里曼多元体的适当潜在空间建模与新一代计划相结合的新一代计划,产生更有意义的样本,特别是在小数据集方面。拟议方法通过广泛的实验性研究进行测试,其中强调其对数据集、分类器和培训样本规模的稳健性。该方法还根据具有挑战性的ADNI数据库的医疗成像分类任务加以验证,在该数据库中,利用拟议的VAE框架审议和扩充了少量3D脑MTIS。在每种情况下,拟议方法都允许在分类指标中取得重大和可靠的收益。例如,对受过认知正常(CN)和50个老年痴呆病(AD)50个MMSIs培训的最先进的CNS分类器的精度从66.3%提高到74.3%;在经过243 CN和210 AD 高度敏感性和特殊度衡量标准的培训时,从77.3%提高到86.3%。