Despite huge successes reported by the field of machine learning, such as speech assistants or self-driving cars, businesses still observe very high failure rate when it comes to deployment of ML in production. We argue that part of the reason is infrastructure that was not designed for activities around data collection and analysis. We propose to consider flow-based programming with data streams as an alternative to commonly used service-oriented architectures for building software applications. To compare flow-based programming with the widespread service-oriented approach, we develop a data processing application, and formulate two subsequent ML-related tasks that constitute a complete cycle of ML deployment while allowing us to assess characteristics of each programming paradigm in the ML context. Employing both code metrics and empirical observations, we show that when it comes to ML deployment each paradigm has certain advantages and drawbacks. Our main conclusion is that while FBP shows great potential for providing infrastructural benefits for deployment of machine learning, it requires a lot of boilerplate code to define and manipulate the dataflow graph. We believe that with better developer tools in place this problem can be alleviated, establishing FBP as a strong alternative to currently prevalent SOA-driven software design approach. Additionally, we provide an insight into the trend of prioritising model development over data quality management.
翻译:尽管在诸如语言助理或自行驾驶汽车等机器学习领域报告取得了巨大成功,但企业在生产中部署ML时仍然观察到非常高的失败率。我们争辩说,部分原因是基础设施不是为数据收集和分析活动设计的。我们提议考虑以流为基础的编程和数据流来替代常用的面向服务的软件应用程序建设结构。将基于流动的编程与广泛的服务导向方法进行比较,我们开发了一个数据处理应用程序,并制定了随后两项与ML相关的任务,构成ML部署的完整周期,同时使我们能够评估ML背景下每个方案编制模式的特点。我们采用代码衡量标准和经验观察,我们表明在部署ML时,每个模式都有一定的优点和缺点。我们的主要结论是,尽管FBP在为机器学习的部署提供基础设施惠益方面有很大潜力,但需要大量的锅炉码来界定和操纵数据流图。我们认为,在设置更好的开发工具时,可以缓解这一问题,将FBP作为目前流行的SA驱动的软件设计趋势的强有力替代模式。我们为SOA驱动的软件设计方法提供了一种先进的模型。