Over the last years, many 'explainable artificial intelligence' (xAI) approaches have been developed, but these have not always been objectively evaluated. To evaluate the quality of heatmaps generated by various saliency methods, we developed a framework to generate artificial data with synthetic lesions and a known ground truth map. Using this framework, we evaluated two data sets with different backgrounds, Perlin noise and 2D brain MRI slices, and found that the heatmaps vary strongly between saliency methods and backgrounds. We strongly encourage further evaluation of saliency maps and xAI methods using this framework before applying these in clinical or other safety-critical settings.
翻译:在过去几年里,已经制定了许多“可以解释的人工智能”(xAI)方法,但这些方法并不总是得到客观的评估。为了评估各种突出方法产生的热图的质量,我们制定了一个框架,用合成损伤和已知的地面真相图生成人造数据。我们利用这个框架评估了两个背景不同的数据集,即Perlin噪音和2D脑MRI切片,发现热图在突出的方法和背景之间差异很大。我们强烈鼓励在临床或其他安全临界环境中应用这些模型之前,进一步评估显要地图和xAI方法。