Fairness is critical for artificial intelligence systems, especially for those deployed in high-stakes applications such as hiring and justice. Existing efforts toward fairness in machine learning fairness require retraining or fine-tuning the neural network weights to meet the fairness criteria. However, this is often not feasible in practice for regular model users due to the inability to access and modify model weights. In this paper, we propose a more flexible fairness paradigm, Inference-Time Rule Eraser, or simply Eraser, which considers the case where model weights can not be accessed and tackles fairness issues from the perspective of biased rules removal at inference-time. We first verified the feasibility of modifying the model output to wipe the biased rule through Bayesian analysis, and deduced Inference-Time Rule Eraser via subtracting the logarithmic value associated with unfair rules (i.e., the model's response to biased features) from the model's logits output as a means of removing biased rules. Moreover, we present a specific implementation of Rule Eraser that involves two stages: (1) limited queries are performed on the model with inaccessible weights to distill its biased rules into an additional patched model, and (2) during inference time, the biased rules already distilled into the patched model are excluded from the output of the original model, guided by the removal strategy outlined in Rule Eraser. Exhaustive experimental evaluation demonstrates the effectiveness and superior performance of the proposed Rule Eraser in addressing fairness concerns.
翻译:暂无翻译