PMLB v1.0:用于基准制定机器学习方法的开放源数据集收集 (PMLB v1.0: an open source dataset collection for benchmarking machine learning methods)

Trang T. Le,William La Cava,Joseph D. Romano,John T. Gregg,Daniel J. Goldberg,Praneel Chakraborty,Natasha L. Ray,Daniel Himmelstein,Weixuan Fu,Jason H. Moore

from arxiv, 6 pages, 2 figures

PMLB (Penn Machine Learning Benchmark) is an open-source data repository containing a curated collection of datasets for evaluating and comparing machine learning (ML) algorithms. Compiled from a broad range of existing ML benchmark collections, PMLB synthesizes and standardizes hundreds of publicly available datasets from diverse sources such as the UCI ML repository and OpenML, enabling systematic assessment of different ML methods. These datasets cover a range of applications, from binary/multi-class classification to regression problems with combinations of categorical and continuous features. PMLB has both a Python interface (pmlb) and an R interface (pmlbr), both with detailed documentation that allows the user to access cleaned and formatted datasets using a single function call. PMLB also provides a comprehensive description of each dataset and advanced functions to explore the dataset space, allowing for smoother user experience and handling of data. The resource is designed to facilitate open-source contributions in the form of datasets as well as improvements to curation.

翻译：PMLB(Penn机器学习基准)是一个公开的数据储存库,包含一套用于评价和比较机器学习算法的分类数据集集,从现有的多种ML基准收集、PMLB合成和标准化了来自各种来源的数百个公开数据集,如UCI ML储存库和OpenML,从而能够对不同的ML方法进行系统评估。这些数据集涵盖一系列应用,从二进制/多级分类到与绝对和连续特征相结合的回归问题。PMLB有一个Python界面(pmlb)和R界面(pmlbr),两者都有详细的文档,使用户能够利用单一功能调用获得经过清理和格式化的数据集。PMLB还全面描述了每个数据集和高级功能,以探索数据集空间,使用户能够更顺畅地体验和处理数据。该资源的设计是为了便利以数据集的形式提供开放源的贡献,并改进校正。

相关内容

Machine Learning

关注 2242

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

专知会员服务

39+阅读 · 2020年11月3日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

【斯坦福】机器学习优化简明导论， Introduction to Optimization for Machine Learning

专知会员服务

93+阅读 · 2020年5月6日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning