Tree-based models are used in many high-stakes application domains such as finance and medicine, where robustness and interpretability are of utmost importance. Yet, methods for improving and certifying their robustness are severely under-explored, in contrast to those focusing on neural networks. Targeting this important challenge, we propose deterministic smoothing for decision stump ensembles. Whereas most prior work on randomized smoothing focuses on evaluating arbitrary base models approximately under input randomization, the key insight of our work is that decision stump ensembles enable exact yet efficient evaluation via dynamic programming. Importantly, we obtain deterministic robustness certificates, even jointly over numerical and categorical features, a setting ubiquitous in the real world. Further, we derive an MLE-optimal training method for smoothed decision stumps under randomization and propose two boosting approaches to improve their provable robustness. An extensive experimental evaluation shows that our approach yields significantly higher certified accuracies than the state-of-the-art for tree-based models. We release all code and trained models at ANONYMIZED.
翻译:以树为基础的模型用于许多高取量的应用领域,如金融和医学,其稳健性和可解释性极为重要。然而,改进和验证其稳健性的方法,与侧重于神经网络的方法相比,探索得严重不足。针对这一重要挑战,我们提议为决策立木组合提供决定性的平滑。虽然以前关于随机滑动的大部分工作侧重于在投入随机化情况下对任意基础模型进行评估,但我们工作的关键见解是,通过动态编程,决定立木编组能够准确而高效地评估。重要的是,我们获得了确定性稳健性证书,甚至共同超过数字和绝对性特征,这是现实世界中普遍存在的一种设置。此外,我们为随机化的平稳决策立木制定了一种MLE最佳培训方法,并提出了两种增强方法,以提高其稳健性。一项广泛的实验评估表明,我们的方法所产生的经认证的优于以树为基础的模型的状态。我们在ANONYMIZED释放了所有代码和经过培训的模型。