Missing data are ubiquitous in the era of big data and, if inadequately handled, are known to lead to biased findings and have deleterious impact on data-driven decision makings. To mitigate its impact, many missing value imputation methods have been developed. However, the fairness of these imputation methods across sensitive groups has not been studied. In this paper, we conduct the first known research on fairness of missing data imputation. By studying the performance of imputation methods in three commonly used datasets, we demonstrate that unfairness of missing value imputation widely exists and may be associated with multiple factors. Our results suggest that, in practice, a careful investigation of related factors can provide valuable insights on mitigating unfairness associated with missing data imputation.
翻译:在海量数据时代,缺失的数据无处不在,如果处理不当,已知会导致有偏向的调查结果,并对数据驱动的决策产生有害影响。为了减轻其影响,已经制定了许多缺失的价值估算方法。然而,尚未研究敏感群体之间这些估算方法的公平性。在本文件中,我们首次对缺失的数据估算公平性进行了已知的研究。通过在三个常用数据集中研究估算方法的绩效,我们证明缺失的价值估算的不公平性普遍存在,并可能与多种因素相关。我们的结果表明,在实践中,仔细调查相关因素可以提供宝贵的见解,说明如何减轻与缺失的数据估算相关的不公平性。