Existing statistical methods can estimate a policy, or a mapping from covariates to decisions, which can then instruct decision makers (e.g., whether to administer hypotension treatment based on covariates blood pressure and heart rate). There is great interest in using such data-driven policies in healthcare. However, it is often important to explain to the healthcare provider, and to the patient, how a new policy differs from the current standard of care. This end is facilitated if one can pinpoint the aspects of the policy (i.e., the parameters for blood pressure and heart rate) that change when moving from the standard of care to the new, suggested policy. To this end, we adapt ideas from Trust Region Policy Optimization (TRPO). In our work, however, unlike in TRPO, the difference between the suggested policy and standard of care is required to be sparse, aiding with interpretability. This yields ``relative sparsity," where, as a function of a tuning parameter, $\lambda$, we can approximately control the number of parameters in our suggested policy that differ from their counterparts in the standard of care (e.g., heart rate only). We propose a criterion for selecting $\lambda$, perform simulations, and illustrate our method with a real, observational healthcare dataset, deriving a policy that is easy to explain in the context of the current standard of care. Our work promotes the adoption of data-driven decision aids, which have great potential to improve health outcomes.
翻译:现有的统计方法可以对政策进行估计,或从共变方法到决策进行绘图,然后对决策者进行指示(例如,是否根据共变血压和心率来管理低压治疗); 人们对在保健方面采用这种数据驱动的政策非常感兴趣; 然而,向保健提供者和病人解释新政策与目前护理标准有何不同往往很重要。 如果能够确定政策的某些方面(即血压和心率参数)在从护理标准转向新的、建议的政策时会发生变化,那么,就能够促进这一政策的变化(例如,血压和心率参数)。为此目的,我们调整信任区域政策优化(TRPO)的想法。然而,与TRPO不同的是,建议的政策与护理标准之间的差别必须稀疏,以解释的方式加以解释。这会产生“弹性”。 作为调理参数的函数, $=lambda$,我们可以大致控制我们建议的政策中的参数数量,与在实际护理标准中的对应方不同(e.g.crass a production droduction a droduction a droducal graduction graduction graduction dal disation), 我们的保健标准中, 我们只作出这样的数据。