Mitigating algorithmic bias is a critical task in the development and deployment of machine learning models. While several toolkits exist to aid machine learning practitioners in addressing fairness issues, little is known about the strategies practitioners employ to evaluate model fairness and what factors influence their assessment, particularly in the context of text classification. Two common approaches of evaluating the fairness of a model are group fairness and individual fairness. We run a study with Machine Learning practitioners (n=24) to understand the strategies used to evaluate models. Metrics presented to practitioners (group vs. individual fairness) impact which models they consider fair. Participants focused on risks associated with underpredicting/overpredicting and model sensitivity relative to identity token manipulations. We discover fairness assessment strategies involving personal experiences or how users form groups of identity tokens to test model fairness. We provide recommendations for interactive tools for evaluating fairness in text classification.
翻译:减轻算法偏差是开发和部署机器学习模式的一项关键任务。虽然存在若干工具包,帮助机器学习实践者解决公平问题,但很少有人知道从事评估模型公平性和影响其评估的因素的战略,特别是在文本分类方面。评价模型公平性的两种共同方法是群体公平和个人公平。我们与机器学习实践者进行了一项研究(n=24),以了解用于评价模型的战略。向执业者介绍了他们认为公平的模型影响的计量(群体与个人公平)。与会者着重讨论了与预测不足/过度和模型对身份象征操纵的敏感性有关的风险。我们发现了涉及个人经验或用户如何组成身份标志组以测试模型公平性的公平性评估战略。我们为评价文本分类中的公平性提供了互动工具建议。</s>