Counterfactual instances are a powerful tool to obtain valuable insights into automated decision processes, describing the necessary minimal changes in the input space to alter the prediction towards a desired target. Most previous approaches require a separate, computationally expensive optimization procedure per instance, making them impractical for both large amounts of data and high-dimensional data. Moreover, these methods are often restricted to certain subclasses of machine learning models (e.g. differentiable or tree-based models). In this work, we propose a deep reinforcement learning approach that transforms the optimization procedure into an end-to-end learnable process, allowing us to generate batches of counterfactual instances in a single forward pass. Our experiments on real-world data show that our method i) is model-agnostic (does not assume differentiability), relying only on feedback from model predictions; ii) allows for generating target-conditional counterfactual instances; iii) allows for flexible feature range constraints for numerical and categorical attributes, including the immutability of protected features (e.g. gender, race); iv) is easily extended to other data modalities such as images.
翻译:反实际情况是获得对自动化决策过程的宝贵洞察力的有力工具,描述了改变对预期目标的预测所需的投入空间最小变化。以往大多数方法都要求单立的、计算成本昂贵的优化程序,使大量数据和高维数据不切实际。此外,这些方法往往局限于机器学习模型的某些子类(如不同或树基模型)。在这项工作中,我们建议了一种深强化学习方法,将优化程序转化为端至端学习程序,使我们能够在单一前方传道中产生成批反事实实例。我们对真实世界数据的实验显示,我们的方法i)是模型不可知性(不具有差异性),仅依赖模型预测的反馈;二)允许产生目标有条件反事实实例;三)允许对数字和绝对属性采取灵活的特征范围限制,包括保护特征(如性别、种族)的不可移动性;四)很容易扩展到图像等其他数据模式。