Machine learning models are updated as new data is acquired or new architectures are developed. These updates usually increase model performance, but may introduce backward compatibility errors, where individual users or groups of users see their performance on the updated model adversely affected. This problem can also be present when training datasets do not accurately reflect overall population demographics, with some groups having overall lower participation in the data collection process, posing a significant fairness concern. We analyze how ideas from distributional robustness and minimax fairness can aid backward compatibility in this scenario, and propose two methods to directly address this issue. Our theoretical analysis is backed by experimental results on CIFAR-10, CelebA, and Waterbirds, three standard image classification datasets. Code available at github.com/natalialmg/GroupBC
翻译:机器学习模式随着新数据获得或新结构开发而更新,这些更新通常会提高模型性能,但可能会引入后向兼容性错误,使个别用户或用户群体看到自己在更新模型上的性能受到不利影响,当培训数据集不能准确反映总体人口统计时,也会出现这一问题,有些群体对数据收集过程的参与总体较低,这引起了相当的公平关注。我们分析了分配稳健性和小质量公平性的想法如何有助于这一情景中的后向兼容性,并提出了直接解决这一问题的两种方法。我们的理论分析得到三个标准图像分类数据集CIFAR-10、CelibA和Waterbird三个图像分类数据集的实验结果的支持。Gathub.com/Nationalmg/GroupBC提供了代码。