We argue that, when establishing and benchmarking Machine Learning (ML) models, the research community should favour evaluation metrics that better capture the value delivered by their model in practical applications. For a specific class of use cases -- selective classification -- we show that not only can it be simple enough to do, but that it has import consequences and provides insights what to look for in a ``good'' ML model.
翻译:我们认为,在建立和确定机械学习模式的基准时,研究界应该支持更好地捕捉其模型在实际应用中所提供价值的评价指标。 对于特定类别的使用案例 -- -- 选择性分类 -- -- 我们表明,不仅可以简单易行,而且具有进口后果,并提供了在“好”ML模型中寻找什么的洞察力。