In this paper, we introduce a Variational Autoencoder (VAE) based training approach that can compress and decompress cancer pathology slides at a compression ratio of 1:512, which is better than the previously reported state of the art (SOTA) in the literature, while still maintaining accuracy in clinical validation tasks. The compression approach was tested on more common computer vision datasets such as CIFAR10, and we explore which image characteristics enable this compression ratio on cancer imaging data but not generic images. We generate and visualize embeddings from the compressed latent space and demonstrate how they are useful for clinical interpretation of data, and how in the future such latent embeddings can be used to accelerate search of clinical imaging data.
翻译:在本文中,我们介绍了一种基于变分自编码器(VAE)的训练方法,可以将癌症病理切片压缩和解压缩至1:512的压缩比,优于文献中以前报告的最先进技术(SOTA),同时仍保持在临床验证任务中的准确性。该压缩方法在更常见的计算机视觉数据集如CIFAR10上测试,并探索了哪些图像特征使得癌症成像数据能够实现这种压缩比而普通图像则不能。我们从压缩的潜在空间生成和可视化嵌入,并演示它们如何有助于数据的临床解释,以及如何在将来使用这样的潜在嵌入来加速临床成像数据搜索。