Automated visual localisation of subcellular proteins can accelerate our understanding of cell function in health and disease. Despite recent advances in machine learning (ML), humans still attain superior accuracy by using diverse clues. We show how this gap can be narrowed by addressing three key aspects: (i) automated improvement of cell annotation quality, (ii) new Convolutional Neural Network (CNN) architectures supporting unbalanced and noisy data, and (iii) informed selection and fusion of multiple & diverse machine learning models. We introduce a new "AI-trains-AI" method for improving the quality of weak labels and propose novel CNN architectures exploiting wavelet filters and Weibull activations. We also explore key factors in the multi-CNN ensembling process by analysing correlations between image-level and cell-level predictions. Finally, in the context of the Human Protein Atlas, we demonstrate that our system achieves state-of-the-art performance in the multi-label single-cell classification of protein localisation patterns. It also significantly improves generalisation ability.
翻译:亚细胞蛋白的自动视觉本地化可以加快我们对健康和疾病细胞功能的理解。尽管在机器学习方面最近有所进步,但人类仍然通过使用多种线索获得了更高的准确性。我们展示了如何通过处理以下三个关键方面缩小这一差距:(一) 自动改进细胞注解质量,(二) 支持数据不平衡和噪音的新的进化神经网络(CNN)结构,以及(三) 知情选择和融合多种和多种机器学习模式。我们引入了一种新的“AI-trains-AI”方法来提高弱标签的质量,并提出新的CNN结构来利用波盘过滤器和Weibull激活。我们还通过分析图像水平和细胞水平预测之间的相互关系,探索了多CNN聚合过程中的关键因素。最后,在人类Protein Atlas中,我们证明我们的系统在蛋白质本地化模式的多标签单细胞分类中取得了最先进的性能。它还极大地提高了普及能力。