While the need for interpretable machine learning has been established, many common approaches are slow, lack fidelity, or hard to evaluate. Amortized explanation methods reduce the cost of providing interpretations by learning a global selector model that returns feature importances for a single instance of data. The selector model is trained to optimize the fidelity of the interpretations, as evaluated by a predictor model for the target. Popular methods learn the selector and predictor model in concert, which we show allows predictions to be encoded within interpretations. We introduce EVAL-X as a method to quantitatively evaluate interpretations and REAL-X as an amortized explanation method, which learn a predictor model that approximates the true data generating distribution given any subset of the input. We show EVAL-X can detect when predictions are encoded in interpretations and show the advantages of REAL-X through quantitative and radiologist evaluation.
翻译:虽然已经确定需要可解释的机器学习,但许多共同的方法是缓慢的、缺乏忠诚的或难以评估的。 摊销式解释方法通过学习一种全球选择模型,使解释成本降低,该模型返回单一数据实例的重要性。 选择模型经过培训,以优化解释的准确性,如目标预测模型所评估的那样。 流行方法以协同方式学习选择和预测模型,我们显示这些模型允许在解释中对预测进行编码。 我们采用EVAL-X作为定量评估解释的方法,而Melive-X作为摊销式解释方法,该方法学习一种预测模型,以近似真实数据生成的分布,而任何一组输入。 我们显示EVAL-X在将预测纳入解释编码时可以检测到,并通过定量和放射学家评价显示真X的优点。