The success of today's AI applications requires not only model training (Model-centric) but also data engineering (Data-centric). In data-centric AI, active learning (AL) plays a vital role, but current AL tools can not perform AL tasks efficiently. To this end, this paper presents an efficient MLOps system for AL, named ALaaS (Active-Learning-as-a-Service). Specifically, ALaaS adopts a server-client architecture to support an AL pipeline and implements stage-level parallelism for high efficiency. Meanwhile, caching and batching techniques are employed to further accelerate the AL process. In addition to efficiency, ALaaS ensures accessibility with the help of the design philosophy of configuration-as-a-service. It also abstracts an AL process to several components and provides rich APIs for advanced users to extend the system to new scenarios. Extensive experiments show that ALaaS outperforms all other baselines in terms of latency and throughput. Further ablation studies demonstrate the effectiveness of our design as well as ALaaS's ease to use. Our code is available at \url{https://github.com/MLSysOps/alaas}.
翻译:今天的AI应用的成功不仅需要示范培训(模式中心),还需要数据工程(数据中心)。在以数据为中心的AI中,积极学习(AL)发挥着关键作用,但当前的AL工具无法有效完成AL任务。为此,本文件为AL提供了高效的 MLOPs系统,名为ALaaAS(Avial-Learning-as-a-service),具体来说,ALaaAS采用服务器-客户结构支持AL管道,实施阶段级平行,以提高效率。与此同时,还采用了缓存和分批技术,以进一步加速AL进程。除了效率外,ALAAAS还利用配置-as-service的设计理念确保了无障碍性。它还向几个组件摘要介绍了AL进程,并为高级用户提供了丰富的APIs,以将系统扩展到新的情景。广泛的实验显示,ALaaAS在延缓度和通量方面超越了所有其他基线。进一步的烧研究表明我们的设计的有效性以及ALAAS的易用性。我们的代码可在MLAMLA/ursmas.S./Opsurss.