Machine learning applications are becoming increasingly pervasive in our society. Since these decision-making systems rely on data-driven learning, risk is that they will systematically spread the bias embedded in data. In this paper, we propose to analyze biases by introducing a framework for generating synthetic data with specific types of bias and their combinations. We delve into the nature of these biases discussing their relationship to moral and justice frameworks. Finally, we exploit our proposed synthetic data generator to perform experiments on different scenarios, with various bias combinations. We thus analyze the impact of biases on performance and fairness metrics both in non-mitigated and mitigated machine learning models.
翻译:由于这些决策系统依赖于数据驱动的学习,因此风险在于它们会系统地传播数据中所含的偏见。在本文件中,我们提议通过引入一个框架来分析偏见,以生成具有特定类型的偏见及其组合的合成数据。我们探讨了这些偏见的性质,讨论它们与道德和正义框架的关系。最后,我们利用我们提议的合成数据生成器在不同情景上进行实验,并采用各种偏见组合。因此,我们分析了在非缓解和减缓的机器学习模型中,偏见对业绩和公平度量的影响。