Training a fair machine learning model is essential to prevent demographic disparity. Existing techniques for improving model fairness require broad changes in either data preprocessing or model training, rendering themselves difficult-to-adopt for potentially already complex machine learning systems. We address this problem via the lens of bilevel optimization. While keeping the standard training algorithm as an inner optimizer, we incorporate an outer optimizer so as to equip the inner problem with an additional functionality: Adaptively selecting minibatch sizes for the purpose of improving model fairness. Our batch selection algorithm, which we call FairBatch, implements this optimization and supports prominent fairness measures: equal opportunity, equalized odds, and demographic parity. FairBatch comes with a significant implementation benefit -- it does not require any modification to data preprocessing or model training. For instance, a single-line change of PyTorch code for replacing batch selection part of model training suffices to employ FairBatch. Our experiments conducted both on synthetic and benchmark real data demonstrate that FairBatch can provide such functionalities while achieving comparable (or even greater) performances against the state of the arts. Furthermore, FairBatch can readily improve fairness of any pre-trained model simply via fine-tuning. It is also compatible with existing batch selection techniques intended for different purposes, such as faster convergence, thus gracefully achieving multiple purposes.
翻译:培训一个公平的机器学习模式对于防止人口差异至关重要。改进模型公平性的现有技术要求在数据处理前或模型培训中进行广泛的变革,使自己难以适应可能已经十分复杂的机器学习系统。我们通过双层优化来解决这个问题。我们在保持标准的培训算法作为内部优化器的同时,还采用外部优化器,以便使内部问题具备额外的功能:为改进模型公平性,调整地选择小批量尺寸;我们称为“公平批量”,的批量选择算法可以提供这种功能,同时实现这种优化,并支持突出的公平措施:平等机会、机会均等和人口均等。公平批量带来巨大的实施效益 -- -- 它不需要对数据预处理或模型培训进行任何修改。例如,用单行的皮尔奇代码取代模式培训的批次选择部分足以使用公平批量。我们在合成和基准真实数据上进行的实验表明,公平批量可以提供这种功能,同时实现与艺术状态的可比(或甚至更大)性业绩。此外,公平批量的批量可轻易提高任何预选模式的公平性,通过微调实现各种不同的目的。