评价和评价枢纽:数据和模型计量的更好最佳做法 (Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements)

Leandro von Werra,Lewis Tunstall,Abhishek Thakur,Alexandra Sasha Luccioni,Tristan Thrush,Aleksandra Piktus,Felix Marty,Nazneen Rajani,Victor Mustar,Helen Ngo,Omar Sanseviero,Mario Šaško,Albert Villanova,Quentin Lhoest,Julien Chaumond,Margaret Mitchell,Alexander M. Rush,Thomas Wolf,Douwe Kiela

Evaluation is a key part of machine learning (ML), yet there is a lack of support and tooling to enable its informed and systematic practice. We introduce Evaluate and Evaluation on the Hub --a set of tools to facilitate the evaluation of models and datasets in ML. Evaluate is a library to support best practices for measurements, metrics, and comparisons of data and models. Its goal is to support reproducibility of evaluation, centralize and document the evaluation process, and broaden evaluation to cover more facets of model performance. It includes over 50 efficient canonical implementations for a variety of domains and scenarios, interactive documentation, and the ability to easily share implementations and outcomes. The library is available at https://github.com/huggingface/evaluate. In addition, we introduce Evaluation on the Hub, a platform that enables the large-scale evaluation of over 75,000 models and 11,000 datasets on the Hugging Face Hub, for free, at the click of a button. Evaluation on the Hub is available at https://huggingface.co/autoevaluate.

翻译：评估是机器学习(ML)的一个关键部分,然而,缺乏支持和工具,无法使其在知情和系统做法中发挥作用。我们引入了对枢纽的评价和评价 -- -- 一套工具,以便利对ML的模型和数据集进行评价。评价是一个图书馆,用于支持衡量、衡量和比较数据和模型的最佳做法,目的是支持评价的再复制、集中和记录评价进程,并扩大评价范围,以涵盖模型业绩的更多方面。它包括50多个领域和设想的高效的理论实施,互动文件,以及方便分享执行和结果的能力。图书馆可在https://github.com/huggingface/ederview上查阅。此外,我们引入了对枢纽的评价,这是一个平台,可以免费对75 000多个模型和11 000个Hugging Face Chugger中心数据集进行大规模评价,点击一个按钮。对枢纽的评价可在https://huggface.co/autovalevation上查阅。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/