Unfairness in mortgage lending has created generational inequality among racial and ethnic groups in the US. Many studies address this problem, but most existing work focuses on correlation-based techniques. In our work, we use the framework of counterfactual fairness to train fair machine learning models. We propose a new causal graph for the variables available in the Home Mortgage Disclosure Act (HMDA) data. We use a matching-based approach instead of the latent variable modeling approach, because the former approach does not rely on any modeling assumptions. Furthermore, matching provides us with counterfactual pairs in which the race variable is isolated. We first demonstrate the unfairness in mortgage approval and interest rates between African-American and non-Hispanic White sub-populations. Then, we show that having balanced data using matching does not guarantee perfect counterfactual fairness of the machine learning models.
翻译:抵押贷款的不公平造成了美国种族和族裔群体之间的代际不平等。许多研究都涉及到这一问题,但大部分现有工作都侧重于基于关联的技术。我们在工作中利用反事实公平框架来培训公平的机器学习模式。我们为《家庭抵押贷款披露法》数据中可用的变量提出了一个新的因果图表。我们采用了基于匹配的方法,而不是潜在的可变模型方法,因为前者并不依赖任何模型模型的假设。此外,匹配为我们提供了将种族变量隔离的反事实配对。我们首先展示了非裔美国人和非西班牙裔白人子人口在抵押贷款批准和利率方面的不公平。 然后,我们表明,使用匹配的平衡数据并不能保证机器学习模式的完全相反的公平性。