Given a learning problem with real-world tradeoffs, which cost function should the model be trained to optimize? This is the metric selection problem in machine learning. Despite its practical interest, there is limited formal guidance on how to select metrics for machine learning applications. This thesis outlines metric elicitation as a principled framework for selecting the performance metric that best reflects implicit user preferences. Once specified, the evaluation metric can be used to compare and train models. In this manuscript, we formalize the problem of Metric Elicitation and devise novel strategies for eliciting classification performance metrics using pairwise preference feedback over classifiers. Specifically, we provide novel strategies for eliciting linear and linear-fractional metrics for binary and multiclass classification problems, which are then extended to a framework that elicits group-fair performance metrics in the presence of multiple sensitive groups. All the elicitation strategies that we discuss are robust to both finite sample and feedback noise, thus are useful in practice for real-world applications. Using the tools and the geometric characterizations of the feasible confusion statistics sets from the binary, multiclass, and multiclass-multigroup classification setups, we further provide strategies to elicit from a wider range of complex, modern multiclass metrics defined by quadratic functions of confusion statistics by exploiting their local linear structure. From application perspective, we also propose to use the metric elicitation framework in optimizing complex black box metrics that is amenable to deep network training. Lastly, to bring theory closer to practice, we conduct a preliminary real-user study that shows the efficacy of the metric elicitation framework in recovering the users' preferred performance metric in a binary classification setup.
翻译:鉴于真实世界交易的学习问题,这个成本函数应该使模型得到优化?这是机器学习的衡量选择问题。尽管它具有实际兴趣,但对于机器学习应用程序选择衡量标准的正式指导有限。该论文概述了作为最佳反映隐含用户偏好的业绩衡量标准选择原则框架的衡量引引。一旦具体指明,评价衡量标准可用于比较和训练模型。在这个手稿中,我们将Metri Eliucation问题正式化,并制定新的战略,利用双向偏好用户的反馈而不是分类人员,来获取分类业绩衡量标准。具体地说,我们为二进制和多级分类问题提供新战略,为二进制和多级分类问题引入线性和线性差指标衡量标准,而后又扩展到一个框架,在多个敏感群体存在的情况下,将衡量标准引出群体公平的业绩衡量标准。我们讨论的所有推论战略都既适用于有限的抽样和反馈噪音,因此在实践上对现实世界应用有用。我们利用工具以及从二进制、多级和多级多级集团分类的对可能的混乱统计的精确度评估。我们进一步提出从一个更深层次的标准化的衡量标准,从一个更精确的模型到从一个更精确的系统,从一个更精确的模型到从一个更深层次的系统,从一个更精确的模型到从一个更精确的系统,从一个从一个更精确的模型到一个更精确的系统,从一个更精确的模型到一个更精确的计算,从一个更精确的模型到一个更精确的计算,从一个更精确的模型到一个更精确的模型,从一个更精确的模型的模型的计算,从一个从一个更精确的计算到一个更精确的模型的模型到一个从一个更精确的计算,从我们从一个更精确的模型到一个从一个更精确的模型到一个更精确的计算的计算的模型的模型到一个从一个更精确的系统,从一个更精确的模型,从一个更精确的模型到一个更精确的模型到一个更精确的计算,从一个更精确的模型到一个更精确的模型到一个更精确的计算,从一个从一个更精确的模型到一个更精确的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型,从一个更精确的模型,从