Multi-instance learning is common for computer vision tasks, especially in biomedical image processing. Traditional methods for multi-instance learning focus on designing feature aggregation methods and multi-instance classifiers, where the aggregation operation is performed either in feature extraction or learning phase. As deep neural networks (DNNs) achieve great success in image processing via automatic feature learning, certain feature aggregation mechanisms need to be incorporated into common DNN architecture for multi-instance learning. Moreover, flexibility and reliability are crucial considerations to deal with varying quality and number of instances. In this study, we propose a hierarchical aggregation network for multi-instance learning, called HAMIL. The hierarchical aggregation protocol enables feature fusion in a defined order, and the simple convolutional aggregation units lead to an efficient and flexible architecture. We assess the model performance on two microscopy image classification tasks, namely protein subcellular localization using immunofluorescence images and gene annotation using spatial gene expression images. The experimental results show that HAMIL outperforms the state-of-the-art feature aggregation methods and the existing models for addressing these two tasks. The visualization analyses also demonstrate the ability of HAMIL to focus on high-quality instances.
翻译:多因子学习是计算机视觉任务,特别是在生物医学图像处理中常见的多因子学习的传统方法。多因子学习的方法侧重于设计特征集成方法和多因子分类器,在集成作业中,在特征提取或学习阶段进行。深神经网络(DNN)通过自动特征学习在图像处理方面取得巨大成功,需要将某些特征聚合机制纳入多因子学习的共同DNN结构中。此外,灵活性和可靠性是处理不同质量和实例数量的关键考虑因素。在本研究中,我们建议为多因子学习建立一个等级汇总网络,称为HAMIL。等级集成协议允许在固定的顺序下进行特征聚合,而简单的卷合组单元则导致一个高效和灵活的结构。我们评估两种微镜图像分类任务的模型性能,即利用免疫光学图像进行蛋白子细胞局部化,以及利用空间基因表达图像进行基因说明。实验结果显示,HAMIL超越了当前处理这两项任务的能力。视觉分析还展示了高质量实例的能力。