An increasing number of applications in the computer vision domain, specially, in medical imaging and remote sensing, are challenging when the goal is to classify very large images with tiny objects. More specifically, these type of classification tasks face two key challenges: $i$) the size of the input image in the target dataset is usually in the order of megapixels, however, existing deep architectures do not easily operate on such big images due to memory constraints, consequently, we seek a memory-efficient method to process these images; and $ii$) only a small fraction of the input images are informative of the label of interest, resulting in low region of interest (ROI) to image ratio. However, most of the current convolutional neural networks (CNNs) are designed for image classification datasets that have relatively large ROIs and small image size (sub-megapixel). Existing approaches have addressed these two challenges in isolation. We present an end-to-end CNN model termed Zoom-In network that leverages hierarchical attention sampling for classification of large images with tiny objects using a single GPU. We evaluate our method on two large-image datasets and one gigapixel dataset. Experimental results show that our model achieves higher accuracy than existing methods while requiring less computing resources.
翻译:在计算机视觉领域,特别是医学成像和遥感领域,越来越多的应用在将极大图像与微小天体进行分类时具有挑战性。更具体地说,这类分类任务面临两大挑战:一美元,目标数据集中输入图像的大小通常是按巨型像素排序的,但是,由于记忆力的限制,现有深层结构不易在这种大图像上操作,因此,我们寻求一种记忆效率高的方法来处理这些图像,因此,我们寻求一种处理这些图像的记忆效率方法;以及,只有一小部分输入图像在标签中含有兴趣,导致兴趣区域(ROI)与图像比率低。然而,目前大多数的共生神经网络(CNNs)是为图像分类数据集设计的,这些图像分类数据集相对较大,ROIs和图像尺寸较小(sub-megapixel),但现有的方法孤立地解决了这两个挑战。我们提出了一个端到端到端的CNN模型,称为Zoom-In网络,它利用一个单一的GPU对微小天体的大型图像进行分层关注取样,从而进行分类。我们评估了两种方法,在两个大型模型上评估了我们的模型,而需要更精确地显示现有数据结果。