Many technical approaches have been proposed for ensuring that decisions made by machine learning systems are fair, but few of these proposals have been stress-tested in real-world systems. This paper presents an example of one team's approach to the challenge of applying algorithmic fairness approaches to complex production systems within the context of a large technology company. We discuss how we disentangle normative questions of product and policy design (like, "how should the system trade off between different stakeholders' interests and needs?") from empirical questions of system implementation (like, "is the system achieving the desired tradeoff in practice?"). We also present an approach for answering questions of the latter sort, which allows us to measure how machine learning systems and human labelers are making these tradeoffs across different relevant groups. We hope our experience integrating fairness tools and approaches into large-scale and complex production systems will be useful to other practitioners facing similar challenges, and illuminating to academics and researchers looking to better address the needs of practitioners.
翻译:已经提出了许多技术方法,以确保机器学习系统所作的决定是公平的,但这些建议中很少有是在现实世界系统中经过压力测试的。本文件举例说明一个团队如何应对在大型技术公司范围内对复杂的生产系统应用算法公平方法的挑战。我们讨论如何将产品和政策设计方面的规范性问题(例如,“系统应如何在不同利害关系方的利益和需要之间交换?” )与系统执行的经验性问题(例如,“系统在实际中是否实现了预期的权衡?” )分开。我们还介绍了一种回答后一种问题的方法,使我们能够衡量机器学习系统和人类标签如何在不同的相关群体之间作出这些权衡。我们希望,我们的经验将公平工具和方法纳入大规模和复杂的生产系统,将对面临类似挑战的其他从业人员有用,并向希望更好地满足从业人员需要的学者和研究人员发扬光光光。