Machine learning models trained on uncurated datasets can often end up adversely affecting inputs belonging to underrepresented groups. To address this issue, we consider the problem of adaptively constructing training sets which allow us to learn classifiers that are fair in a minimax sense. We first propose an adaptive sampling algorithm based on the principle of optimism, and derive theoretical bounds on its performance. We also propose heuristic extensions of this algorithm suitable for application to large scale, practical problems. Next, by deriving algorithm independent lower-bounds for a specific class of problems, we show that the performance achieved by our adaptive scheme cannot be improved in general. We then validate the benefits of adaptively constructing training sets via experiments on synthetic tasks with logistic regression classifiers, as well as on several real-world tasks using convolutional neural networks (CNNs).
翻译:在未经精确的数据集方面受过训练的机床学习模型往往最终会对属于代表性不足群体的投入产生不利影响。为了解决这一问题,我们考虑了适应性地建造培训组的问题,这使我们得以学习在迷你马克思意义上公平的分类师。我们首先提出基于乐观原则的适应性抽样算法,并从它的性能中得出理论界限。我们还提出这种算法的超自然延伸,适合应用于大规模的实际问题。接下来,通过为某类特定问题得出独立的低限算法,我们表明我们的适应性计划所取得的业绩无法普遍改善。然后,我们验证了通过与物流回归分类师一起的合成任务实验以及利用革命性神经网络(CNNs)的若干现实世界任务来适应性地建造培训组的好处。