调和推理服务系统的高准确性、成本效益和低延迟 (Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems) - 专知论文

会员服务 ·

0

高准确性 · 计算资源 · 服务系统 · 机器学习推理 · 学习推理 ·

2023 年 4 月 21 日

Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems

翻译：调和推理服务系统的高准确性、成本效益和低延迟

Mehran Salmani,Saeid Ghafouri,Alireza Sanaee,Kamran Razavi,Max Mühlhäuser,Joseph Doyle,Pooyan Jamshidi,Mohsen Sharif

The use of machine learning (ML) inference for various applications is growing drastically. ML inference services engage with users directly, requiring fast and accurate responses. Moreover, these services face dynamic workloads of requests, imposing changes in their computing resources. Failing to right-size computing resources results in either latency service level objectives (SLOs) violations or wasted computing resources. Adapting to dynamic workloads considering all the pillars of accuracy, latency, and resource cost is challenging. In response to these challenges, we propose InfAdapter, which proactively selects a set of ML model variants with their resource allocations to meet latency SLO while maximizing an objective function composed of accuracy and cost. InfAdapter decreases SLO violation and costs up to 65% and 33%, respectively, compared to a popular industry autoscaler (Kubernetes Vertical Pod Autoscaler).

翻译：机器学习推理服务的使用正在迅速增长。机器学习推理服务直接与用户互动，需要快速和准确的响应。此外，这些服务面临着请求的动态负载，需要修改其计算资源。计算资源估算过大或过小将导致延迟服务水平目标（SLO）违规或浪费计算资源。考虑到精度、延迟和资源成本的所有要素以适应动态工作负载，是具有挑战性的。针对这些挑战，我们提出了 InfAdapter，它主动选择一组机器学习模型变量以及它们的资源分配，以满足延迟 SLO，并最大化由准确性和成本组成的目标函数。与一个主流行业自动缩放器（Kubernetes Vertical Pod Autoscaler）相比，InfAdapter可以将 SLO 违规和成本分别降低至 65% 和 33%。

0

相关内容

高准确性

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【经典书】高效机器学习，Efficient Learning Machines，263页pdf

【经典书】高效机器学习，Efficient Learning Machines，263页pdf

专知会员服务

68+阅读 · 2022年4月11日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

面向移动云环境的委托式数据安全共享关键技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

高速高可靠低成本红外单光子探测器的研究

国家自然科学基金

0+阅读 · 2013年12月31日

高效长寿命量子存储

国家自然科学基金

0+阅读 · 2013年12月31日

供应链中生产、存储与运输的协同调度建模与优化研究

国家自然科学基金

0+阅读 · 2013年12月31日

数据并行与线程并行合一的可伸缩处理器体系结构

国家自然科学基金

2+阅读 · 2013年12月31日

实时系统的非剥夺资源共享和分层调度

国家自然科学基金

0+阅读 · 2012年12月31日

舰船线束串扰的预估

国家自然科学基金

0+阅读 · 2012年12月31日

具有可变爬坡率约束的电力系统优化与精确调度

国家自然科学基金

0+阅读 · 2012年12月31日

非固定时长柔性周期维护调度问题的理论与算法

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency

JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency

Arxiv

0+阅读 · 2023年6月7日

Extension of the Blackboard Architecture with Common Properties and Generic Rules

Arxiv

0+阅读 · 2023年6月7日

High-Performance Caching of Homomorphic Encryption for Cloud Databases

Arxiv

0+阅读 · 2023年6月7日

FireFly: A High-Throughput Hardware Accelerator for Spiking Neural Networks with Efficient DSP and Memory Optimization

Arxiv

0+阅读 · 2023年6月6日

Efficient automatic design of robots

Arxiv

1+阅读 · 2023年6月5日

DVFO: Dynamic Voltage, Frequency Scaling and Workload Offloading for DNN Edge Inference

Arxiv

0+阅读 · 2023年6月2日

A Comprehensive Survey on Orbital Edge Computing: Systems, Applications, and Algorithms

A Comprehensive Survey on Orbital Edge Computing: Systems, Applications, and Algorithms

Arxiv

0+阅读 · 2023年6月2日

Enabling Deep Learning on Edge Devices

Arxiv

19+阅读 · 2022年10月6日

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Arxiv

35+阅读 · 2022年4月25日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

VIP会员

文章信息

相关主题

机器学习推理

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【经典书】高效机器学习，Efficient Learning Machines，263页pdf

【经典书】高效机器学习，Efficient Learning Machines，263页pdf

专知会员服务

68+阅读 · 2022年4月11日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency

JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency

Arxiv

0+阅读 · 2023年6月7日

Extension of the Blackboard Architecture with Common Properties and Generic Rules

Arxiv

0+阅读 · 2023年6月7日

High-Performance Caching of Homomorphic Encryption for Cloud Databases

Arxiv

0+阅读 · 2023年6月7日

FireFly: A High-Throughput Hardware Accelerator for Spiking Neural Networks with Efficient DSP and Memory Optimization

Arxiv

0+阅读 · 2023年6月6日

Efficient automatic design of robots

Arxiv

1+阅读 · 2023年6月5日

DVFO: Dynamic Voltage, Frequency Scaling and Workload Offloading for DNN Edge Inference

Arxiv

0+阅读 · 2023年6月2日

A Comprehensive Survey on Orbital Edge Computing: Systems, Applications, and Algorithms

A Comprehensive Survey on Orbital Edge Computing: Systems, Applications, and Algorithms

Arxiv

0+阅读 · 2023年6月2日

Enabling Deep Learning on Edge Devices

Arxiv

19+阅读 · 2022年10月6日

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Arxiv

35+阅读 · 2022年4月25日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

相关基金

面向移动云环境的委托式数据安全共享关键技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

高速高可靠低成本红外单光子探测器的研究

国家自然科学基金

0+阅读 · 2013年12月31日

高效长寿命量子存储

国家自然科学基金

0+阅读 · 2013年12月31日

供应链中生产、存储与运输的协同调度建模与优化研究

国家自然科学基金

0+阅读 · 2013年12月31日

数据并行与线程并行合一的可伸缩处理器体系结构

国家自然科学基金

2+阅读 · 2013年12月31日

实时系统的非剥夺资源共享和分层调度

国家自然科学基金

0+阅读 · 2012年12月31日

舰船线束串扰的预估

国家自然科学基金

0+阅读 · 2012年12月31日

具有可变爬坡率约束的电力系统优化与精确调度

国家自然科学基金

0+阅读 · 2012年12月31日

非固定时长柔性周期维护调度问题的理论与算法

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员