The explosion of data volumes generated by an increasing number of applications is strongly impacting the evolution of distributed digital infrastructures for data analytics and machine learning (ML). While data analytics used to be mainly performed on cloud infrastructures, the rapid development of IoT infrastructures and the requirements for low-latency, secure processing has motivated the development of edge analytics. Today, to balance various trade-offs, ML-based analytics tends to increasingly leverage an interconnected ecosystem that allows complex applications to be executed on hybrid infrastructures where IoT Edge devices are interconnected to Cloud/HPC systems in what is called the Computing Continuum, the Digital Continuum, or the Transcontinuum.Enabling learning-based analytics on such complex infrastructures is challenging. The large scale and optimized deployment of learning-based workflows across the Edge-to-Cloud Continuum requires extensive and reproducible experimental analysis of the application execution on representative testbeds. This is necessary to help understand the performance trade-offs that result from combining a variety of learning paradigms and supportive frameworks. A thorough experimental analysis requires the assessment of the impact of multiple factors, such as: model accuracy, training time, network overhead, energy consumption, processing latency, among others.This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today. It describes the main learning paradigms enabling learning-based analytics on the Edge-to-Cloud Continuum. The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed. Furthermore, we analyze how the selected systems provide support for experiment reproducibility. We conclude our review with a detailed discussion of relevant open research challenges and of future directions in this domain such as: holistic understanding of performance; performance optimization of applications;efficient deployment of Artificial Intelligence (AI) workflows on highly heterogeneous infrastructures; and reproducible analysis of experiments on the Computing Continuum.
翻译:越来越多的应用软件所产生的数据数量爆炸正在对数据分析与机器学习(ML)的分布式数字基础设施的演变产生强烈的影响。数据分析过去主要在云层基础设施、IOT基础设施的迅速发展和低延迟要求方面进行,而安全处理则促使边缘分析的发展。今天,为了平衡各种取舍,基于ML的解析往往会日益利用一个相互关联的生态系统,从而可以在混合基础设施上执行复杂的应用,使IOT Edge装置与被称为EcontalUum、数字Otal-Stencialum或Transcontinuum的云/HPC系统相互连接;数据分析过去主要在云层基础设施上进行,基于学习的解析性分析具有挑战性;大规模和最佳地利用基于学习的流流流力分析需要广泛和再生的实验性分析。 在有代表性的测试平台上,我们现有的应用应用应用的实验性分析,也有必要帮助理解通过将各种模型和辅助性框架相结合而得出的业绩交换结果:在模型和有利的时间框架上进行深入的理论性分析,这种实验性分析需要对数据库进行一项评估。