With the wide spread of sensors and smart devices in recent years, the data generation speed of the Internet of Things (IoT) systems has increased dramatically. In IoT systems, massive volumes of data must be processed, transformed, and analyzed on a frequent basis to enable various IoT services and functionalities. Machine Learning (ML) approaches have shown their capacity for IoT data analytics. However, applying ML models to IoT data analytics tasks still faces many difficulties and challenges, specifically, effective model selection, design/tuning, and updating, which have brought massive demand for experienced data scientists. Additionally, the dynamic nature of IoT data may introduce concept drift issues, causing model performance degradation. To reduce human efforts, Automated Machine Learning (AutoML) has become a popular field that aims to automatically select, construct, tune, and update machine learning models to achieve the best performance on specified tasks. In this paper, we conduct a review of existing methods in the model selection, tuning, and updating procedures in the area of AutoML in order to identify and summarize the optimal solutions for every step of applying ML algorithms to IoT data analytics. To justify our findings and help industrial users and researchers better implement AutoML approaches, a case study of applying AutoML to IoT anomaly detection problems is conducted in this work. Lastly, we discuss and classify the challenges and research directions for this domain.
翻译:近年来,随着传感器和智能装置的广泛传播,物联网系统(IoT)系统的数据生成速度大幅提高;在IoT系统中,必须经常处理、转换和分析大量数据,以便能够提供各种IoT服务和功能;机器学习(ML)方法表明它们有能力进行IoT数据分析;然而,将ML模型应用于IoT数据分析任务,仍面临许多困难和挑战,特别是有效的模型选择、设计/调整和更新,这给有经验的数据科学家带来了巨大的需求;此外,IoT数据的动态性质可能带来概念漂移问题,造成模型性能退化;为了减少人类的努力,自动机学习(AutomotML)已成为一个受欢迎的领域,目的是自动选择、构建、调制和更新机器学习模型,以便在规定的任务中取得最佳业绩;在本文中,我们审查在AutoML领域选择、调整和更新模型方面的现有方法,以便确定和总结我们应用ML算法的每一步的最佳解决办法,从而导致模型的模型性差问题;在IML领域对IML数据库的研究中,将这一数据分析方法更好地应用于对IML的分类;在IML研究中,这是对IML数据库进行一项分析案例进行。