Existing statistical methods can be used to estimate a policy, or a mapping from covariates to decisions, which can then instruct decision makers. There is great interest in using such data-driven policies in healthcare. In healthcare, however, it is often important to explain to the healthcare provider, and to the patient, how a new policy differs from the current standard of care. This end is facilitated if one can pinpoint the aspects (i.e., parameters) of the policy that change most when moving from the standard of care to the new, suggested policy. To this end, we adapt ideas from Trust Region Policy Optimization. In our work, however, unlike in Trust Region Policy Optimization, the difference between the suggested policy and standard of care is required to be sparse, aiding with interpretability. In particular, we trade off between maximizing expected reward and minimizing the $L_1$ norm divergence between the parameters of the two policies. This yields "relative sparsity," where, as a function of a tuning parameter, $\lambda$, we can approximately control the number of parameters in our suggested policy that differ from their counterparts in the standard of care. We develop our methodology for the observational data setting. We propose a problem-specific criterion for selecting $\lambda$, perform simulations, and illustrate our method with a real, observational healthcare dataset, deriving a policy that is easy to explain in the context of the current standard of care. Our work promotes the adoption of data-driven decision aids, which have great potential to improve health outcomes.
翻译:现有的统计方法可以用来估计政策,或从变量到决定的映射,然后可以指导决策者。在保健方面,人们对使用这类数据驱动的政策非常感兴趣。然而,在保健方面,通常重要的是向保健提供者和病人解释新政策与目前护理标准有何不同。如果能确定在从护理标准向新的、建议的政策转变时最主要改变的政策的方面(即参数),就有助于达到这一目的。为此,我们调整了信任区域政策优化的想法。然而,与信任区域政策优化不同,我们在工作中,建议的政策与护理标准之间的差别需要少一些,需要加以解释。特别是,在最大预期的奖励和将两种政策参数之间的标准差异最小化之间,我们进行交易。这会产生“弹性”,而作为调整参数的函数,我们调整了信任区域政策优化。我们可以大致控制我们建议的政策中的参数数量,与信任区域政策优化不同,需要通过解释。 特别是,我们用标准化的方法,我们用我们的标准,我们用一个标准,我们的数据,我们用一个标准,我们用一个标准,我们的标准,我们的数据,我们用一个标准,我们的数据,我们用一个标准,我们的数据,我们用一个标准,我们用一个标准,我们的数据,我们用一个标准,我们的方法来说明我们的方法,我们用一个标准, 来解释我们的方法来解释我们的方法来解释。