With the increasing number of Internet of Things (IoT) devices, massive amounts of raw data is being generated. The latency, cost, and other challenges in cloud-based IoT data processing have driven the adoption of Edge and Fog computing models, where some data processing tasks are moved closer to data sources. Properly dealing with the flow of such data requires building data pipelines, to control the complete life cycle of data streams from data acquisition at the data source, edge and fog processing, to Cloud side storage and analytics. Data analytics tasks need to be executed dynamically at different distances from the data sources and often on very heterogeneous hardware devices. This can be streamlined by the use of a Serverless (or FaaS) cloud computing model, where tasks are defined as virtual functions, which can be migrated from edge to cloud (and vice versa) and executed in an event-driven manner on data streams. In this work, we investigate the benefits of building Serverless data pipelines (SDP) for IoT data analytics and evaluate three different approaches for designing SDPs: 1) Off-the-shelf data flow tool (DFT) based, 2) Object storage service (OSS) based and 3) MQTT based. Further, we applied these strategies on three fog applications (Aeneas, PocketSphinx, and custom Video processing application) and evaluated the performance by comparing their processing time (computation time, network communication and disk access time), and resource utilization. Results show that DFT is unsuitable for compute-intensive applications such as video or image processing, whereas OSS is best suitable for this task. However, DFT is nicely fit for bandwidth-intensive applications due to the minimum use of network resources. On the other hand, MQTT-based SDP is observed with increase in CPU and Memory usage as the number of...<truncted to fit character limit in Arxiv>
翻译:随着Things(IoT) 的互联网数量不断增加,大量原始数据正在生成。基于云的 IoT 数据处理的延迟、成本和其他挑战促使人们采用 Edge 和 Fog 计算模型,其中某些数据处理任务更接近数据源。处理这些数据流需要建立数据管道,以控制数据源、边缘和雾处理、云边存储和分析等数据获取数据流的整个生命周期。数据分析任务需要在数据源的不同距离动态地执行,而且往往在非常多样化的硬件设备上进行。这可以通过使用无服务器(或FaaaS) 的云计算模型来简化,因为有些任务被定义为虚拟功能,这些功能可以从边缘迁移到云层(反向反向),以事件驱动的方式执行数据流的数据流。 在IoT任务中,用无服务器数据流数据管道(SDP) 来增加数据流数据流数据流的跟踪,用三种不同的方法来设计 SDP :1 Offer-al-developy 应用程序, 用于基于SFS-TF 的S-lent 数据流的S-al-T) 数据流, 数据流的SDFTF-deal 数据流,用这些运行工具的SDFDF-T-S-t-t-s-deal-lavial 和ODF-S-deal-S-S-S-S-S-view-S-S-TTTTT-viewdal-S-vial-vial-vial-vial-vivial-vical-viviewd-vid-vid-vidal-tod-tod-tod-tod-vical-s-s-tod-tod-s-s-tod-s-s-tod-tod-tod-tod-tod-tod-tod-tod-tod-tod-tod-toal-tod-tod-tod-tod-tod-tod-tod-d-d-S-tod-tod-tod-tod-tod-tod-SL-tod-tod-tod-S-S-S-tod-S-S