Automated cellular instance segmentation is a process utilized for accelerating biological research for the past two decades, and recent advancements have produced higher quality results with less effort from the biologist. Most current endeavors focus on completely cutting the researcher out of the picture by generating highly generalized models. However, these models invariably fail when faced with novel data, distributed differently than the ones used for training. Rather than approaching the problem with methods that presume the availability of large amounts of target data and computing power for retraining, in this work we address the even greater challenge of designing an approach that requires minimal amounts of new annotated data as well as training time. We do so by designing specialized contrastive losses that leverage the few annotated samples very efficiently. A large set of results show that 3 to 5 annotations lead to models with accuracy that: 1) significantly mitigate the covariate shift effects; 2) matches or surpasses other adaptation methods; 3) even approaches methods that have been fully retrained on the target distribution. The adaptation training is only a few minutes, paving a path towards a balance between model performance, computing requirements and expert-level annotation needs.
翻译:在过去二十年中,自动细胞切除是加速生物研究的一个过程,最近的进展产生了质量更高的结果,生物学家没有作出更多的努力。目前多数努力的重点是通过产生高度普及的模型,将研究人员完全从画面中切除出来。然而,这些模型在面对新数据时总是失败,其分布方式与培训所用方法不同。在这项工作中,我们不是用假定有大量目标数据和计算能力可用于再培训的方法来解决问题,而是用假设有大量目标数据和再培训的计算能力的方法来处理问题。在这项工作中,我们处理的更大挑战是设计一种方法,该方法需要最低数量的附加说明的新数据以及培训时间。我们这样做的办法是设计专门的对比性损失,以非常有效地利用少数附带说明的样本。一大批结果显示,3至5个说明导致模型的准确性:(1) 显著减轻了共变换效应;(2) 与其他适应方法相匹配或超过;(3) 甚至采用在目标分发方面经过充分再培训的方法。适应培训的方法只有几分钟,为模型性能、计算要求和专家水平需要之间的平衡铺路。