A significant impediment to progress in research on bias in machine learning (ML) is the availability of relevant datasets. This situation is unlikely to change much given the sensitivity of such data. For this reason, there is a role for synthetic data in this research. In this short paper, we present one such family of synthetic data sets. We provide an overview of the data, describe how the level of bias can be varied, and present a simple example of an experiment on the data.
翻译:在研究机器学习中的偏向方面,一个阻碍进展的重大障碍是能否获得相关的数据集,鉴于这些数据的敏感性,这种情况不大可能发生很大变化。因此,合成数据在这项研究中具有一定的作用。在本简短的文件中,我们提出了一个合成数据集的这种组合。我们提供数据概览,说明偏向程度如何可以变化,并提供一个关于数据实验的简单例子。