Deep neural networks (DNNs) show promise in image-based medical diagnosis, but cannot be fully trusted since their performance can be severely degraded by dataset shifts to which human perception remains invariant. If we can better understand the differences between human and machine perception, we can potentially characterize and mitigate this effect. We therefore propose a framework for comparing human and machine perception in medical diagnosis. The two are compared with respect to their sensitivity to the removal of clinically meaningful information, and to the regions of an image deemed most suspicious. Drawing inspiration from the natural image domain, we frame both comparisons in terms of perturbation robustness. The novelty of our framework is that separate analyses are performed for subgroups with clinically meaningful differences. We argue that this is necessary in order to avert Simpson's paradox and draw correct conclusions. We demonstrate our framework with a case study in breast cancer screening, and reveal significant differences between radiologists and DNNs. We compare the two with respect to their robustness to Gaussian low-pass filtering, performing a subgroup analysis on microcalcifications and soft tissue lesions. For microcalcifications, DNNs use a separate set of high frequency components than radiologists, some of which lie outside the image regions considered most suspicious by radiologists. These features run the risk of being spurious, but if not, could represent potential new biomarkers. For soft tissue lesions, the divergence between radiologists and DNNs is even starker, with DNNs relying heavily on spurious high frequency components ignored by radiologists. Importantly, this deviation in soft tissue lesions was only observable through subgroup analysis, which highlights the importance of incorporating medical domain knowledge into our comparison framework.


翻译:深心神经网络(DNNS)在基于图像的医疗诊断中显示出希望,但不能完全相信,因为其性能会因人类感知仍然变化不定的数据集变化而严重退化。如果我们能够更好地理解人类和机器感知之间的差异,我们就可以确定并减轻这种影响。因此,我们提议了一个框架,用以比较人体和机器在医学诊断中的感知。我们建议了一个框架来比较人体和机器的感知。我们比较这两个框架是为了比较对删除临床上有意义的信息以及被认为最可疑图像的区域的敏感性。从自然图像域的灵感中,我们以扰动性坚固性强度为基准。我们这个框架的新颖性放射值的偏差的对比,对软性组织损伤的强度进行分级分析。对于具有临床上意义差异的分组分析。我们认为,这是必要的,以避免辛普森的悖论,并得出正确的结论结论。我们在乳腺癌筛查的案例研究中展示了我们的框架,并揭示了放射科和DNNF之间的重大差异。 我们比较了这两个部分的坚固性成分,因为对微量值的数值和软性组织性组织的偏差值的分分析, 也就是的深度分析是高频率,在无线电结构的深度分析中可以代表着高的轨道的深度分析。

1
下载
关闭预览

相关内容

知识图谱推理,50页ppt,Salesforce首席科学家Richard Socher
专知会员服务
109+阅读 · 2020年6月10日
因果图,Causal Graphs,52页ppt
专知会员服务
249+阅读 · 2020年4月19日
[综述]深度学习下的场景文本检测与识别
专知会员服务
78+阅读 · 2019年10月10日
机器学习相关资源(框架、库、软件)大列表
专知会员服务
40+阅读 · 2019年10月9日
Transferring Knowledge across Learning Processes
CreateAMind
28+阅读 · 2019年5月18日
已删除
将门创投
3+阅读 · 2019年4月19日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
17+阅读 · 2018年12月24日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
Arxiv
13+阅读 · 2019年4月9日
VIP会员
相关资讯
Transferring Knowledge across Learning Processes
CreateAMind
28+阅读 · 2019年5月18日
已删除
将门创投
3+阅读 · 2019年4月19日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
17+阅读 · 2018年12月24日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
Top
微信扫码咨询专知VIP会员