Radiologists today play a key role in making diagnostic decisions and labeling images for training A.I. algorithms. Low inter-reader reliability (IRR) can be seen between experts when interpreting challenging cases. While teams-based decisions are known to outperform individual decisions, inter-personal biases often creep up in group interactions which limit non-dominant participants from expressing true opinions. To overcome the dual problems of low consensus and inter-personal bias, we explored a solution modeled on biological swarms of bees. Two separate cohorts; three radiologists and five radiology residents collaborated on a digital swarm platform in real time and in a blinded fashion, grading meniscal lesions on knee MR exams. These consensus votes were benchmarked against clinical (arthroscopy) and radiological (senior-most radiologist) observations. The IRR of the consensus votes was compared to the IRR of the majority and most confident votes of the two cohorts.The radiologist cohort saw an improvement of 23% in IRR of swarm votes over majority vote. Similar improvement of 23% in IRR in 3-resident swarm votes over majority vote, was observed. The 5-resident swarm had an even higher improvement of 32% in IRR over majority vote. Swarm consensus votes also improved specificity by up to 50%. The swarm consensus votes outperformed individual and majority vote decisions in both the radiologists and resident cohorts. The 5-resident swarm had higher IRR than 3-resident swarm indicating positive effect of increased swarm size. The attending and resident swarms also outperformed predictions from a state-of-the-art A.I. algorithm. Utilizing a digital swarm platform improved agreement and allows participants to express judgement free intent, resulting in superior clinical performance and robust A.I. training labels.
翻译:今天,放射学家在为培训A.I.算法作出诊断性决定和标注图像方面发挥着关键作用。在翻译具有挑战性的案件时,专家之间可以看到阅读者可靠性低的可靠性。尽管以团队为基础的决定被认为优于个人的决定,但人际偏见往往在团体互动中蔓延,这限制了非主要参与者表达真实意见。为了克服共识低和人际偏见的双重问题,我们探索了一种以战争生物体数蜂群为模型的解决方案。两个不同的组群;三个放射学家和五名放射学家在实时和盲目的方式上,在一个数字群群群平台上合作。在膝盖M.M.考试中,分级骨干损伤。这些协商一致票是针对临床(验尸)和辐射(最高级放射学家)观察的。协商一致票与大多数的IRR(I)和两个组最自信的投票相比。放射学家组组发现,在多数选票上改进了23 %。在IRRR.