Cytology is a low-cost and non-invasive diagnostic procedure employed to support the diagnosis of a broad range of pathologies. Computer Vision technologies, by automatically generating quantitative and objective descriptions of examinations' contents, can help minimize the chances of misdiagnoses and shorten the time required for analysis. To identify the state-of-art of computer vision techniques currently applied to cytology, we conducted a Systematic Literature Review. We analyzed papers published in the last 5 years. The initial search was executed in September 2020 and resulted in 431 articles. After applying the inclusion/exclusion criteria, 157 papers remained, which we analyzed to build a picture of the tendencies and problems present in this research area, highlighting the computer vision methods, staining techniques, evaluation metrics, and the availability of the used datasets and computer code. As a result, we identified that the most used methods in the analyzed works are deep learning-based (70 papers), while fewer works employ classic computer vision only (101 papers). The most recurrent metric used for classification and object detection was the accuracy (33 papers and 5 papers), while for segmentation it was the Dice Similarity Coefficient (38 papers). Regarding staining techniques, Papanicolaou was the most employed one (130 papers), followed by H&E (20 papers) and Feulgen (5 papers). Twelve of the datasets used in the papers are publicly available, with the DTU/Herlev dataset being the most used one. We conclude that there still is a lack of high-quality datasets for many types of stains and most of the works are not mature enough to be applied in a daily clinical diagnostic routine. We also identified a growing tendency towards adopting deep learning-based approaches as the methods of choice.
翻译:计算机视觉技术,通过自动生成对检查内容的定量和客观描述,可以帮助最大限度地减少误诊的可能性,缩短分析所需的时间。为了确定目前用于细胞学的计算机视觉技术的最新技术,我们进行了系统文学审查。我们分析了过去五年中发表的论文。初步搜索于2020年9月进行,并产生了431篇文章。在应用了列入/排除标准之后,仍然有157份文件,我们分析了这些论文,以构建关于本研究领域存在的趋势和问题的图片,强调了计算机视觉方法、污点技术、评价指标以及使用过的数据集和计算机代码的可用性。结果,我们发现分析工作中最常用的方法是基于深层次学习的(70份文件),而仅使用经典计算机观点的作品则较少(101份文件)。用于分类和对象检测的最经常指标是准确性(33份文件和5份文件),而用于这一研究领域的趋势则是用于分析的多层次和高质量数据(5种类似技术),用于解读的日常文件(20种最新文件),用于分析论文中的最接近性文件(我们使用的是使用的一种最新文件)。