Local decision rules are commonly understood to be more explainable, due to the local nature of the patterns involved. With numerical optimization methods such as gradient boosting, ensembles of local decision rules can gain good predictive performance on data involving global structure. Meanwhile, machine learning models are being increasingly used to solve problems in high-stake domains including healthcare and finance. Here, there is an emerging consensus regarding the need for practitioners to understand whether and how those models could perform robustly in the deployment environments, in the presence of distributional shifts. Past research on local decision rules has focused mainly on maximizing discriminant patterns, without due consideration of robustness against distributional shifts. In order to fill this gap, we propose a new method to learn and ensemble local decision rules, that are robust both in the training and deployment environments. Specifically, we propose to leverage causal knowledge by regarding the distributional shifts in subpopulations and deployment environments as the results of interventions on the underlying system. We propose two regularization terms based on causal knowledge to search for optimal and stable rules. Experiments on both synthetic and benchmark datasets show that our method is effective and robust against distributional shifts in multiple environments.
翻译:通常认为,由于所涉模式的当地性质,当地决策规则比较容易解释。随着梯度增高等数字优化方法的出现,地方决策规则的集合在涉及全球结构的数据方面可以取得良好的预测性表现。与此同时,机器学习模式正越来越多地用于解决包括保健和资金在内的高占用领域的问题。在这方面,对于从业人员在分布式转移的情况下了解这些模式是否以及如何在部署环境中强有力地发挥作用的必要性,人们正在形成共识。过去对地方决策规则的研究主要侧重于尽量扩大差异模式,而没有适当考虑对分布式转移的稳健性。为了填补这一空白,我们提出了一种新的在培训和部署环境中都十分稳健的学习和组合式地方决策规则的方法。具体地说,我们提议利用因子人口和部署环境的分配性变化方面的因果关系知识,作为基本系统干预措施的结果。我们提议基于因果关系知识的两个规范条款,以寻求最佳和稳定的规则。对合成和基准数据集的实验表明,我们的方法对于多种环境中的分布性变化是有效和稳健的。