Despite huge successes reported by the field of machine learning, such as voice assistants or self-driving cars, businesses still observe very high failure rate when it comes to deployment of ML in production. We argue that part of the reason is infrastructure that was not designed for data-oriented activities. This paper explores the potential of flow-based programming (FBP) for simplifying data discovery and collection in software systems. We compare FBP with the currently prevalent service-oriented paradigm to assess characteristics of each paradigm in the context of ML deployment. We develop a data processing application, formulate a subsequent ML deployment task, and measure the impact of the task implementation within both programming paradigms. Our main conclusion is that FBP shows great potential for providing data-centric infrastructural benefits for deployment of ML. Additionally, we provide an insight into the current trend that prioritizes model development over data quality management.
翻译:尽管在机器学习领域,如语音助理或自行驾驶汽车,据报告取得了巨大成功,但企业在生产中部署ML时仍然发现非常高的失败率。我们争辩说,部分原因是基础设施不是为面向数据的活动设计的。本文探讨了基于流动的编程(FBP)在简化软件系统中的数据发现和收集方面的潜力。我们将FBP与目前流行的面向服务的模式相比较,以评估ML部署中每个模式的特点。我们开发了一个数据处理应用程序,制定了随后的ML部署任务,并在两个方案拟订模式中衡量任务执行的影响。我们的主要结论是,FBP在为ML的部署提供以数据为中心的基础设施惠益方面有很大的潜力。此外,我们深入了解目前将模式开发置于数据质量管理之上的趋势。