Often, what is termed algorithmic bias in machine learning will be due to historic bias in the training data. But sometimes the bias may be introduced (or at least exacerbated) by the algorithm itself. The ways in which algorithms can actually accentuate bias has not received a lot of attention with researchers focusing directly on methods to eliminate bias - no matter the source. In this paper we report on initial research to understand the factors that contribute to bias in classification algorithms. We believe this is important because underestimation bias is inextricably tied to regularization, i.e. measures to address overfitting can accentuate bias.
翻译:通常,在机器学习中所谓的算法偏差将归因于培训数据中的历史偏差。 但有时这种偏差可能会被算法本身引入(或至少加剧 ) 。 算法本身实际上能够强化偏差的方式并没有引起研究人员的极大关注,研究人员直接关注消除偏差的方法 — — 不论来源如何。 在本文中,我们报告了初步研究,以了解导致分类算法中偏差的因素。 我们认为这一点很重要,因为低估偏差与正规化密不可分地联系在一起,也就是说,解决过度配置的措施可以强化偏差。