Methods to find counterfactual explanations have predominantly focused on one step decision making processes. In this work, we initiate the development of methods to find counterfactual explanations for decision making processes in which multiple, dependent actions are taken sequentially over time. We start by formally characterizing a sequence of actions and states using finite horizon Markov decision processes and the Gumbel-Max structural causal model. Building upon this characterization, we formally state the problem of finding counterfactual explanations for sequential decision making processes. In our problem formulation, the counterfactual explanation specifies an alternative sequence of actions differing in at most k actions from the observed sequence that could have led the observed process realization to a better outcome. Then, we introduce a polynomial time algorithm based on dynamic programming to build a counterfactual policy that is guaranteed to always provide the optimal counterfactual explanation on every possible realization of the counterfactual environment dynamics. We validate our algorithm using both synthetic and real data from cognitive behavioral therapy and show that the counterfactual explanations our algorithm finds can provide valuable insights to enhance sequential decision making under uncertainty.
翻译:寻找反事实解释的方法主要集中于一个步骤的决策过程。在这项工作中,我们开始开发一些方法,为决策过程寻找反事实解释,在这些决策过程中,随着时间的推移,会连续地采取多种依附行动。我们首先正式描述一系列行动和国家,使用有限的地平线Markov决定程序和Gumbel-Max结构性因果模式。基于这一特征,我们正式说明为顺序决策过程寻找反事实解释的问题。在我们的问题提法中,反事实解释具体指明了与所观察到的序列在最多 k 方面不同行动的替代序列,这些序列本可以引导所观察到的进程实现更好的结果。然后,我们采用了基于动态规划的多元时间算法,以建立一个反事实政策,保证始终能对反事实环境动态的每一项可能实现提供最佳的反事实解释。我们用认知行为疗法的合成数据和真实数据来验证我们的算法,并表明我们所发现的反事实解释可以提供宝贵的洞察力,以便在不确定的情况下加强顺序决策。