In recent years, the data science community has pursued excellence and made significant research efforts to develop advanced analytics, focusing on solving technical problems at the expense of organizational and socio-technical challenges. According to previous surveys on the state of data science project management, there is a significant gap between technical and organizational processes. In this article we present new empirical data from a survey to 237 data science professionals on the use of project management methodologies for data science. We provide additional profiling of the survey respondents' roles and their priorities when executing data science projects. Based on this survey study, the main findings are: (1) Agile data science lifecycle is the most widely used framework, but only 25% of the survey participants state to follow a data science project methodology. (2) The most important success factors are precisely describing stakeholders' needs, communicating the results to end-users, and team collaboration and coordination. (3) Professionals who adhere to a project methodology place greater emphasis on the project's potential risks and pitfalls, version control, the deployment pipeline to production, and data security and privacy.
翻译:近年来,数据科学界追求卓越,为发展先进分析做出了重大研究努力,重点是解决技术问题,而忽视组织和社会技术挑战。根据以往关于数据科学项目管理状况的调查,技术和组织进程之间存在巨大差距。在本条中,我们向237名数据科学专业人员介绍了关于数据科学项目管理方法使用情况的调查所得新经验数据。我们在执行数据科学项目时对调查对象的作用及其优先事项作了进一步说明。根据这项调查研究,主要结论是:(1) 数据科学周期是使用最广泛的框架,但只有25%的调查参与者表示要遵循数据科学项目方法。 (2) 最重要的成功因素是准确描述利益攸关方的需要,向最终用户通报结果,以及团队合作与协调。(3) 坚持项目方法的专业人员更加强调项目的潜在风险和陷阱、版本控制、生产部署管道以及数据安全和隐私。(3) 坚持项目方法的专业人员更加强调项目的潜在风险和陷阱、版本控制、生产部署管道和数据安全及隐私。