The shift towards end-to-end deep learning has brought unprecedented advances in many areas of computer vision. However, deep neural networks are trained on images with resolutions that rarely exceed $1,000 \times 1,000$ pixels. The growing use of scanners that create images with extremely high resolutions (average can be $100,000 \times 100,000$ pixels) thereby presents novel challenges to the field. Most of the published methods preprocess high-resolution images into a set of smaller patches, imposing an a priori belief on the best properties of the extracted patches (magnification, field of view, location, etc.). Herein, we introduce Magnifying Networks (MagNets) as an alternative deep learning solution for gigapixel image analysis that does not rely on a preprocessing stage nor requires the processing of billions of pixels. MagNets can learn to dynamically retrieve any part of a gigapixel image, at any magnification level and field of view, in an end-to-end fashion with minimal ground truth (a single global, slide-level label). Our results on the publicly available Camelyon16 and Camelyon17 datasets corroborate to the effectiveness and efficiency of MagNets and the proposed optimization framework for whole slide image classification. Importantly, MagNets process far less patches from each slide than any of the existing approaches ($10$ to $300$ times less).
翻译:向端到端的深层次学习的转变在计算机视觉的许多领域带来了前所未有的进步。然而,深神经网络在分辨率很少超过1 000美元的图像上接受了培训,其分辨率很少超过1 000美元,1 000美元像素。越来越多地使用扫描仪来制作分辨率极高的图像(平均为100 000美元像素),从而给实地带来了新的挑战。大多数公布的方法将高分辨率图像预处理成一组较小的补丁,对提取的补丁(放大、视野领域、位置等)的最佳性能有先验的信念。在这里,我们引入了放大网络(MagNets),以此作为一种替代的深层次学习方法,用来进行不依赖预处理阶段或需要处理数十亿像素的谷盘图像分析。MagNets可以学会在任何放大程度和视觉领域,以最起码的地面真相(一个单一的全球、幻灯片等级标签)为顶端的方式,将GAglyon16和Camerinet的图像分析结果作为替代方法,从现有的GAglilyNet的每个数字化和最短的图像框架,从现有的GAglishal 17的频率到远的进度。