The success of today's AI applications requires not only model training (Model-centric) but also data engineering (Data-centric). In data-centric AI, active learning (AL) plays a vital role, but current AL tools 1) require users to manually select AL strategies, and 2) can not perform AL tasks efficiently. To this end, this paper presents an automatic and efficient MLOps system for AL, named ALaaS (Active-Learning-as-a-Service). Specifically, 1) ALaaS implements an AL agent, including a performance predictor and a workflow controller, to decide the most suitable AL strategies given users' datasets and budgets. We call this a predictive-based successive halving early-stop (PSHEA) procedure. 2) ALaaS adopts a server-client architecture to support an AL pipeline and implements stage-level parallelism for high efficiency. Meanwhile, caching and batching techniques are employed to further accelerate the AL process. In addition to efficiency, ALaaS ensures accessibility with the help of the design philosophy of configuration-as-a-service. Extensive experiments show that ALaaS outperforms all other baselines in terms of latency and throughput. Also, guided by the AL agent, ALaaS can automatically select and run AL strategies for non-expert users under different datasets and budgets. Our code is available at \url{https://github.com/MLSysOps/Active-Learning-as-a-Service}.
翻译:今天的AI应用的成功不仅需要模式培训(以模式为中心的),还需要数据工程(以数据为中心的)。在以数据为中心的AI中,积极学习(AL)发挥着关键作用,但当前的AL工具1要求用户手工选择AL战略,2不能高效执行AL任务。为此,本文件为AL(名为ALaaAS(Avicive-Learning-as-a-Service))提供了一个自动有效的 MLOPs系统。具体来说,1 ALaaAS实施一个AL代理,包括一个性能预测器和一个工作流程控制器,以根据用户数据集和预算决定最合适的AL战略。我们称之为基于预测的连续连续将早期停止(PSHEA)程序。2 ALaAS采用服务器客户架构来支持AL管道,并采用舞台级平行系统,以进一步加速AL进程。除了效率之外,ALaaaAS(LA)/LAS(LA)服务器还可以在配置-LA服务的设计哲学中确保无障碍访问。在ALAS(ALS)预算下自动选择ALAS-S(ALS)用户和ALS-Server)的所有基线。