Tree-based models are used in many high-stakes application domains such as finance and medicine, where robustness and interpretability are of utmost importance. Yet, methods for improving and certifying their robustness are severely under-explored, in contrast to those focusing on neural networks. Targeting this important challenge, we propose deterministic smoothing for decision stump ensembles. Whereas most prior work on randomized smoothing focuses on evaluating arbitrary base models approximately under input randomization, the key insight of our work is that decision stump ensembles enable exact yet efficient evaluation via dynamic programming. Importantly, we obtain deterministic robustness certificates, even jointly over numerical and categorical features, a setting ubiquitous in the real world. Further, we derive an MLE-optimal training method for smoothed decision stumps under randomization and propose two boosting approaches to improve their provable robustness. An extensive experimental evaluation on computer vision and tabular data tasks shows that our approach yields significantly higher certified accuracies than the state-of-the-art for tree-based models. We release all code and trained models at https://github.com/eth-sri/drs.
翻译:以树为基础的模型用于许多高取量的应用领域,如金融和医学,其中稳健性和可解释性极为重要。然而,改进和验证其稳健性的方法,与侧重于神经网络的方法相比,探索得严重不足。针对这一重要挑战,我们提议为决策立木组合提供决定性的平滑。虽然大多数以前关于随机滑动的工作侧重于在投入随机化情况下对任意基础模型进行评估,但我们工作的关键见解是,决定立木集合能够通过动态程序进行准确而有效的评价。重要的是,我们获得了确定性稳健性证书,甚至共同超过数字和绝对特征,这是现实世界中普遍存在的设置。此外,我们为随机化的平稳决策立木制定了MLE最佳培训方法,并提出了两种增强方法,以提高其可验证的稳健性。对计算机愿景和表格数据任务进行的广泛实验性评价表明,我们的方法比基于树基模型的状态-艺术得到更高的认证。我们在 https://girib/comthrs发布所有代码和经过培训的模型。我们在 https://gith/s-lib/comthrs.