The need to measure bias encoded in tabular data that are used to solve pattern recognition problems is widely recognized by academia, legislators and enterprises alike. In previous work, we proposed a bias quantification measure, called fuzzy-rough uncer-tainty, which relies on the fuzzy-rough set theory. The intuition dictates that protected features should not change the fuzzy-rough boundary regions of a decision class significantly. The extent to which this happens is a proxy for bias expressed as uncertainty in adecision-making context. Our measure's main advantage is that it does not depend on any machine learning prediction model but adistance function. In this paper, we extend our study by exploring the existence of bias encoded implicitly in non-protected featuresas defined by the correlation between protected and unprotected attributes. This analysis leads to four scenarios that domain experts should evaluate before deciding how to tackle bias. In addition, we conduct a sensitivity analysis to determine the fuzzy operatorsand distance function that best capture change in the boundary regions.
翻译:测量用于解决模式识别问题的表格数据所编码的偏见的必要性得到了学术界、立法者和企业的广泛承认。在以往的工作中,我们建议了一种称为fuzzy-rough uncilty的偏向量化措施,该措施依赖于模糊的集束理论。直觉表明,受保护的特征不应显著改变决策阶层的模糊的边界区域。这种情况发生的程度是偏向的代名词,表现为决策环境中的不确定性。我们的措施的主要优势在于它不依赖于任何机器学习预测模型,而是远程功能。在本文中,我们扩大我们的研究范围,探索在受保护和受保护属性之间相关关系所定义的非保护性特征中隐含编码的偏向的存在。这一分析导致四种情景,在确定如何处理偏向之前,域专家应该加以评估。此外,我们进行敏感性分析,以确定最能捕捉到边界区域变化的模糊操作者和远程功能。