Machine learning models are often implemented in cohort with humans in the pipeline, with the model having an option to defer to a domain expert in cases where it has low confidence in its inference. Our goal is to design mechanisms for ensuring accuracy and fairness in such prediction systems that combine machine learning model inferences and domain expert predictions. Prior work on "deferral systems" in classification settings has focused on the setting of a pipeline with a single expert and aimed to accommodate the inaccuracies and biases of this expert to simultaneously learn an inference model and a deferral system. Our work extends this framework to settings where multiple experts are available, with each expert having their own domain of expertise and biases. We propose a framework that simultaneously learns a classifier and a deferral system, with the deferral system choosing to defer to one or more human experts in cases of input where the classifier has low confidence. We test our framework on a synthetic dataset and a content moderation dataset with biased synthetic experts, and show that it significantly improves the accuracy and fairness of the final predictions, compared to the baselines. We also collect crowdsourced labels for the content moderation task to construct a real-world dataset for the evaluation of hybrid machine-human frameworks and show that our proposed learning framework outperforms baselines on this real-world dataset as well.
翻译:机床学习模型往往在编审中的人类组群中实施,模型可以选择在对推论信心低的情况下将这一框架推给域专家。我们的目标是设计各种机制,确保这种预测系统的准确性和公正性,这种系统结合机器学习模型推理和域专家预测。分类环境中的“偏差系统”先前的工作侧重于设置一条管道,由单一专家来测试我们的框架,以适应该专家的不准确和偏差,同时学习推论模型和推迟系统。我们的工作将这一框架扩展至有多个专家可资利用的环境,每个专家都有自己的专长和偏向领域。我们提议了一个框架,同时学习一个分类器和延迟系统,由推迟系统选择在分类器信心低的情况下,在输入时,由一位或多名人类专家来做。我们用一个合成数据集测试我们的框架,由有偏差的合成专家来测试内容调适的数据集,并表明与基线相比,它大大提高了最后预测的准确性和公正性。我们还收集了由群标的标签,每个专家都有自己的专长和偏向自己的领域。我们提议的一个框架同时学习一个分类和延迟,同时学习一个分类系统,由推迟系统选择在输入一个实际世界数据模型的模型模型模型模型上,以显示真实的模型的模型的模型结构结构。