Objective. Chemical named entity recognition (NER) models have the potential to impact a wide range of downstream tasks, from identifying adverse drug reactions to general pharmacoepidemiology. However, it is unknown whether these models work the same for everyone. Performance disparities can potentially cause harm rather than the intended good. Hence, in this paper, we measure gender-related performance disparities of chemical NER systems. Materials and Methods. We develop a framework to measure gender bias in chemical NER models using synthetic data and a newly annotated dataset of over 92,405 words with self-identified gender information from Reddit. We applied and evaluated state-of-the-art biomedical NER models. Results. Our findings indicate that chemical NER models are biased. The results of the bias tests on the synthetic dataset and the real-world data multiple fairness issues. For example, for synthetic data, we find that female-related names are generally classified as chemicals, particularly in datasets containing many brand names rather than standard ones. For both datasets, we find consistent fairness issues resulting in substantial performance disparities between female- and male-related data. Discussion. Our study highlights the issue of biases in chemical NER models. For example, we find that many systems cannot detect contraceptives (e.g., birth control). Conclusion. Chemical NER models are biased and can be harmful to female-related groups. Therefore, practitioners should carefully consider the potential biases of these models and take steps to mitigate them.
翻译:以化学品命名的实体识别(NER)模型有可能影响一系列广泛的下游任务,从查明不利的药物反应到一般药物流行病学,但尚不清楚这些模型是否对每个人同样适用。性能差异可能造成损害,而不是预期的好。因此,在本文件中,我们测量化学净化系统与性别有关的性能差异。材料和方法。我们开发了一个框架,用合成数据衡量化学净化模型中的性别偏差,并开发一个有92 405个字的新附加说明的数据集,其中含有来自Reddit的自我确定的性别信息。我们应用和评价了最新的生物医学净化模型。结果。我们的研究结果表明,化学净化模型存在偏差。合成数据集和真实世界数据中的偏差测试结果具有多重公平性问题。例如,在合成数据中,我们发现与女性有关的名称一般被归类为化学品,特别是在含有许多品牌的数据集而不是标准数据集中。我们发现,由于与自我识别有关的性别信息存在一致的公平问题,因此与女性相关的数据存在重大差异。讨论结果。我们的研究显示,化学净化模型中的偏见问题不能被定位。