Recent advances in AI and ML applications have benefited from rapid progress in NLP research. Leaderboards have emerged as a popular mechanism to track and accelerate progress in NLP through competitive model development. While this has increased interest and participation, the over-reliance on single, and accuracy-based metrics have shifted focus from other important metrics that might be equally pertinent to consider in real-world contexts. In this paper, we offer a preliminary discussion of the risks associated with focusing exclusively on accuracy metrics and draw on recent discussions to highlight prescriptive suggestions on how to develop more practical and effective leaderboards that can better reflect the real-world utility of models.
翻译:AI和ML应用的最近进展得益于国家劳工政策研究的迅速进展,领头板已成为通过竞争性模式开发来跟踪和加速国家劳工政策进展的流行机制,虽然这提高了人们的兴趣和参与程度,但过度依赖单一和基于准确度的衡量标准已经从在现实世界中可能同样相关的其他重要衡量标准转移了重点,在本文中,我们初步讨论了专门侧重于准确度量度指标的相关风险,并借鉴了最近的讨论,着重说明了关于如何制定更实际、更有效的、能够更好地反映模型在现实世界中的效用的规范性建议。