多样化基础设施的机械学习工作流量 (Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures)

With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network. Existing machine learning inference platforms typically assume a homogeneous infrastructure and do not take into account the more complex and tiered computing infrastructure that includes edge devices, local hubs, edge datacenters, and cloud datacenters. On the other hand, recent machine learning efforts have provided viable solutions for model compression, pruning and quantization for heterogeneous environments; for a machine learning model, now we may easily find or even generate a series of models with different tradeoffs between accuracy and efficiency. We design and implement JellyBean, a framework for serving and optimizing machine learning inference workflows on heterogeneous infrastructures. Given service-level objectives (e.g., throughput, accuracy), JellyBean automatically selects the most cost-efficient models that met the accuracy target and decides how to deploy them across different tiers of infrastructures. Evaluations show that JellyBean reduces the total serving cost of visual question answering by up to 58%, and vehicle tracking from the NVIDIA AI City Challenge by up to 36% compared with state-of-the-art model selection and worker assignment solutions. JellyBean also outperforms prior ML serving systems (e.g., Spark on the cloud) up to 5x in serving costs.

翻译：随着智能装置的无处不在的部署和事物的互联网的出现,机器学习推断的数据源已逐渐移动到网络的边缘。现有的机器学习推断平台通常假定一个同质的基础设施,而没有考虑到更复杂和分层的计算机基础设施,其中包括边缘设备、地方枢纽、边缘数据中心和云中的数据中心。另一方面,最近机器学习的努力为模型压缩、剪裁和量化提供了可行的解决方案,用于不同环境;对于机器学习模式来说,我们现在很容易找到或甚至产生一系列模型,在准确性和效率之间有不同的取舍。我们设计和实施JellyBean,这是一个为不同基础设施提供和优化机器推断工作流程的框架。鉴于服务级目标(例如,吞吐量、准确性),JellyBean自动选择了符合准确性目标的最具有成本效益的模型,并决定如何在不同的基础设施中部署这些模型。评价显示,JellyBean将视觉问题的总成本降低到58 %,而车辆在云中追踪来自NVIDIA A AI CRE Ex Ex Eximal Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex Ex ex ex Ex Ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex a ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex laut ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex exfolfolfol ex ex ex ex ex ex ex a ex ex ex ex ex ex ex ex ex ex ex ex a ex ex ex ex ex ex ex a ex ex ex ex ex

相关内容

Machine Learning

关注 2242

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日