The Shapley value (SV) is adopted in various scenarios in machine learning (ML), including data valuation, agent valuation, and feature attribution, as it satisfies their fairness requirements. However, as exact SVs are infeasible to compute in practice, SV estimates are approximated instead. This approximation step raises an important question: do the SV estimates preserve the fairness guarantees of exact SVs? We observe that the fairness guarantees of exact SVs are too restrictive for SV estimates. Thus, we generalise Shapley fairness to probably approximate Shapley fairness and propose fidelity score, a metric to measure the variation of SV estimates, that determines how probable the fairness guarantees hold. Our last theoretical contribution is a novel greedy active estimation (GAE) algorithm that will maximise the lowest fidelity score and achieve a better fairness guarantee than the de facto Monte-Carlo estimation. We empirically verify GAE outperforms several existing methods in guaranteeing fairness while remaining competitive in estimation accuracy in various ML scenarios using real-world datasets.
翻译:在机器学习的各种假设中,包括数据估价、代理价值和特性归属,都采用了“损耗值”这一标准,因为它符合其公平要求。然而,由于精确的SV在实际中无法计算,因此估计是近似的。这一近似步骤提出了一个重要问题:SV估计数是否保留了准确SV的公平保障?我们发现,精确SV的公平保障对于SV的估算来说过于严格。因此,我们概括了“损耗公平性”的公平性,以近似“损耗公平性”的公平性,并提出了“忠诚性”评分,这是衡量SV估计数差异的一种衡量标准,它决定了公平性保障有多大的可能性。我们最后的理论贡献是一种新颖的贪婪积极估算算法,它将使最低忠诚度得分最大化,并实现比事实上的蒙特-卡洛估计更好的公平性保证。我们从经验上核实“GE”在保证公平性方面优于几种现有方法,同时保持使用真实世界数据集在估算各种 ML假设的准确性方面具有竞争力。