机器学习研究的空洞:重新审查发展周期 (Pitfalls in Machine Learning Research: Reexamining the Development Cycle)

Machine learning has the potential to fuel further advances in data science, but it is greatly hindered by an ad hoc design process, poor data hygiene, and a lack of statistical rigor in model evaluation. Recently, these issues have begun to attract more attention as they have caused public and embarrassing issues in research and development. Drawing from our experience as machine learning researchers, we follow the machine learning process from algorithm design to data collection to model evaluation, drawing attention to common pitfalls and providing practical recommendations for improvements. At each step, case studies are introduced to highlight how these pitfalls occur in practice, and where things could be improved.

翻译：机器学习有可能推动数据科学的进一步发展,但受到特别设计过程、数据卫生差和模型评估缺乏统计严谨性等极大阻碍。最近,这些问题开始引起更多关注,因为它们在研发过程中引起了公众和尴尬的问题。根据我们作为机器学习研究人员的经验,我们遵循机器学习过程,从算法设计到数据收集到模型评估,提请注意常见的缺陷,并提出切实可行的改进建议。每一步,都进行个案研究,以突出这些缺陷如何在实践中发生,哪些方面可以改进。

相关内容

Machine Learning

关注 2241

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【经典书】机器学习黑客秘笈(Machine Learning for Hackers)，322页pdf

专知会员服务

46+阅读 · 2021年2月8日