Bayesian Improved Surname Geocoding (BISG) is a ubiquitous tool for predicting race and ethnicity using an individual's geolocation and surname. BISG assumes that in the United States population, surname and geolocation are independent given a particular race or ethnicity. This assumption appears to contradict conventional wisdom including that people often live near their relatives (with the same surname and race). We demonstrate that this independence assumption results in systematic biases for minority subpopulations and we introduce a simple alternative to BISG. Our raking-based prediction algorithm offers a significant improvement over BISG and we validate our algorithm on states' voter registration lists that contain self-identified race/ethnicity. The proposed improvement and the inaccuracies of BISG generalize to applications in election law, health care, finance, tech, law enforcement and many other fields.
翻译:----
贝叶斯改进的姓氏地理编码(BISG)是一种广泛用于使用个人地理位置和姓氏预测种族和族裔的工具。 BISG 假设,在美国人口中,特定种族或族裔的情况下,姓氏和地理位置是独立的。这一假设似乎与常识相矛盾,包括人们经常住在亲属附近(与同一姓氏和种族相同)。我们证明了这种独立性假设会导致少数族裔人群的系统偏差,并介绍了一种简单的BISG替代方案。我们的排名预测算法相比BISG提供了显着的改进,并在各州的选民登记名单上验证我们的算法,其中包含自我确定的种族/族裔。所提出的改进和BISG的不准确性是适用于选举法律,医疗保健,金融,技术,执法和许多其他领域的。