This work focuses on quantitative verification of fairness in tree ensembles. Unlike traditional verification approaches that merely return a single counterexample when the fairness is violated, quantitative verification estimates the ratio of all counterexamples and characterizes the regions where they occur, which is important information for diagnosing and mitigating bias. To date, quantitative verification has been explored almost exclusively for deep neural networks (DNNs). Representative methods, such as DeepGemini and FairQuant, all build on the core idea of Counterexample-Guided Abstraction Refinement, a generic framework that could be adapted to other model classes. We extended the framework into a model-agnostic form, but discovered two limitations: (i) it can provide only lower bounds, and (ii) its performance scales poorly. Exploiting the discrete structure of tree ensembles, our work proposes an efficient quantification technique that delivers any-time upper and lower bounds. Experiments on five widely used datasets demonstrate its effectiveness and efficiency. When applied to fairness testing, our quantification method significantly outperforms state-of-the-art testing techniques.
翻译:本研究聚焦于树集成模型中公平性的量化验证。与传统验证方法仅在公平性被违反时返回单一反例不同,量化验证通过估计所有反例的比例并刻画其出现区域,为诊断和缓解偏差提供了关键信息。迄今为止,量化验证的研究几乎完全集中于深度神经网络(DNNs)。代表性方法如DeepGemini和FairQuant均基于反例引导抽象精化的核心思想构建,该通用框架可适配于其他模型类别。我们将该框架扩展为模型无关形式,但发现其存在两个局限:(i)仅能提供下界估计;(ii)性能扩展性较差。通过利用树集成模型的离散结构特性,本研究提出一种高效的量化技术,能够实时生成任意精度的上下界估计。在五个广泛使用的数据集上的实验验证了该方法的有效性与高效性。应用于公平性测试时,本量化方法显著优于当前最先进的测试技术。