An important limitation to the development of AI-based solutions for In Vitro Fertilization (IVF) is the black-box nature of most state-of-the-art models, due to the complexity of deep learning architectures, which raises potential bias and fairness issues. The need for interpretable AI has risen not only in the IVF field but also in the deep learning community in general. This has started a trend in literature where authors focus on designing objective metrics to evaluate generic explanation methods. In this paper, we study the behavior of recently proposed objective faithfulness metrics applied to the problem of embryo stage identification. We benchmark attention models and post-hoc methods using metrics and further show empirically that (1) the metrics produce low overall agreement on the model ranking and (2) depending on the metric approach, either post-hoc methods or attention models are favored. We conclude with general remarks about the difficulty of defining faithfulness and the necessity of understanding its relationship with the type of approach that is favored.
翻译:由于深层次学习结构的复杂性,大多数最先进的模型的黑箱性质是开发基于AI的体外受精性(IVF)的一个重要限制,因为深层学习结构的复杂性,因此可能引起偏向和公平问题;不仅在IVF领域,而且在整个深层学习界,对可解释的AI的需要都有所增加;这在文献中开始出现一种趋势,作者着重设计客观指标来评价通用解释方法;在本文件中,我们研究了最近提出的应用于胚胎阶段识别的客观忠诚度衡量标准的行为;我们用量度来衡量关注模式和后热方法,并进一步从经验上表明:(1) 指标在模型排名上产生低的总体一致;(2) 取决于衡量方法,后热量方法或关注模式得到偏好;我们最后一般性地谈到界定忠诚性的困难以及理解其与偏好方法类型的关系的必要性。