Demographic classification is essential in fairness assessment in recommender systems or in measuring unintended bias in online networks and voting systems. Important fields like education and politics, which often lay a foundation for the future of equality in society, need scrutiny to design policies that can better foster equality in resource distribution constrained by the unbalanced demographic distribution of people in the country. We collect three publicly available datasets to train state-of-the-art classifiers in the domain of gender and caste classification. We train the models in the Indian context, where the same name can have different styling conventions (Jolly Abraham/Kumar Abhishikta in one state may be written as Abraham Jolly/Abishikta Kumar in the other). Finally, we also perform cross-testing (training and testing on different datasets) to understand the efficacy of the above models. We also perform an error analysis of the prediction models. Finally, we attempt to assess the bias in the existing Indian system as case studies and find some intriguing patterns manifesting in the complex demographic layout of the sub-continent across the dimensions of gender and caste.
翻译:教育与政治等重要领域往往为社会平等的未来打下基础,需要仔细制定政策,以更好地促进受该国人口分布不平衡制约的资源分配平等。我们收集了三个公开的数据集,以在性别和种姓分类领域培训最先进的分类人员。我们在印度范围内对模型进行了培训,在这种背景下,同一名称可能具有不同的定式(一个邦的乔利·亚伯拉罕/库马尔·阿比希什克塔·库马尔可能以亚伯拉罕·乔利/阿比什塔·库马尔为另一邦的写法)。最后,我们还进行交叉测试(不同数据集的培训和测试),以了解上述模型的功效。我们还对预测模型进行了错误分析。最后,我们试图评估现有印度系统中作为案例研究的偏差,并找出一些令人感兴趣的模式,表明次省份在性别和种姓方面的复杂人口结构布局。