Recent advancements in software and hardware technologies have enabled the use of AI/ML models in everyday applications has significantly improved the quality of service rendered. However, for a given application, finding the right AI/ML model is a complex and costly process, that involves the generation, training, and evaluation of multiple interlinked steps (called pipelines), such as data pre-processing, feature engineering, selection, and model tuning. These pipelines are complex (in structure) and costly (both in compute resource and time) to execute end-to-end, with a hyper-parameter associated with each step. AutoML systems automate the search of these hyper-parameters but are slow, as they rely on optimizing the pipeline's end output. We propose the eTOP Framework which works on top of any AutoML system and decides whether or not to execute the pipeline to the end or terminate at an intermediate step. Experimental evaluation on 26 benchmark datasets and integration of eTOPwith MLBox4 reduces the training time of the AutoML system upto 40x than baseline MLBox.
翻译:最近软件和硬件技术的进展使得AI/ML模型在日常应用中的使用显著改善了所提供的服务质量。然而,对于一个给定的应用程序,找到合适的AI/ML模型是一个复杂和昂贵的过程,涉及多个相互关联的步骤(称为管道),如数据预处理、特征工程、选择和模型调整。这些管道在结构上很复杂,执行整个过程的计算资源和时间成本都很高,并与每个步骤相关联的超参数。AutoML系统自动搜索这些超参数,但速度较慢,因为它们依赖于优化管道的最终输出。我们提出了eTOP框架,它可以在任何AutoML系统之上工作,并决定是否在中间步骤处终止管道的执行。在26个基准数据集上的实验评估和与MLBox4的集成降低了AutoML系统的训练时间,使其比基线MLBox快40倍。