Numerical validation is at the core of machine learning research as it allows to assess the actual impact of new methods, and to confirm the agreement between theory and practice. Yet, the rapid development of the field poses several challenges: researchers are confronted with a profusion of methods to compare, limited transparency and consensus on best practices, as well as tedious re-implementation work. As a result, validation is often very partial, which can lead to wrong conclusions that slow down the progress of research. We propose Benchopt, a collaborative framework to automate, reproduce and publish optimization benchmarks in machine learning across programming languages and hardware architectures. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments. To demonstrate its broad usability, we showcase benchmarks on three standard learning tasks: $\ell_2$-regularized logistic regression, Lasso, and ResNet18 training for image classification. These benchmarks highlight key practical findings that give a more nuanced view of the state-of-the-art for these problems, showing that for practical evaluation, the devil is in the details. We hope that Benchopt will foster collaborative work in the community hence improving the reproducibility of research findings.
翻译:数字验证是机器学习研究的核心,因为它能够评估新方法的实际影响,并证实理论和实践之间的一致。然而,该领域的迅速发展带来了若干挑战:研究人员面临着大量比较方法,透明度有限,关于最佳做法的共识有限,以及重复实施工作乏味。结果,验证往往非常片面,可能导致错误的结论,减缓研究进度。我们建议建立一个协作框架,使各种语言和硬件结构的机器学习自动化、复制和发布优化基准。门徒化简化了社区基准,为运行、共享和扩大实验提供了现成的工具。为了显示其广泛可用性,我们展示了三项标准学习任务的基准: $@ell_2$-正规化的后勤回归、Lasso和ResNet18图像分类培训。这些基准突出关键的实际结果,更细化地展示了这些问题的先进状态,显示实际评估,魔鬼在细节中展示。我们希望合作研究的结果将促进合作性工作。