Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. Results: This release of PMLB provides the largest collection of diverse, public benchmark datasets for evaluating new machine learning and data science methods aggregated in one location. v1.0 introduces a number of critical improvements developed following discussions with the open-source community. Availability: PMLB is available at https://github.com/EpistasisLab/pmlb. Python and R interfaces for PMLB can be installed through the Python Package Index and Comprehensive R Archive Network, respectively.
翻译:动因:新机器学习和统计模型研究依靠对使用经过良好研究的基准数据集的现有方法进行标准化的比较,很少有工具能够通过一个标准化的、方便用户的、与大众数据科学工作流程融为一体的界面,迅速获取许多这类数据集。结果:PMLB的发布为评价在一个地点汇总的新机器学习和数据科学方法提供了最大的各种公共基准数据集。 v1.0介绍了在与开放源码社区讨论之后制定的若干重大改进。可提供:PMLB可在https://github.com/EpistasisLab/pmlb上查阅。PMLB的Python和R界面可以分别通过Python综合索引和综合档案网络安装。