LMStream: 当分配的微批量流处理系统满足 GPU 时 (LMStream: When Distributed Micro-Batch Stream Processing Systems Meet GPU) - 专知论文

会员服务 ·

0

Stream Processing · 流 · Processing（编程语言） · 可约的 · 优化器 ·

2021 年 11 月 8 日

LMStream: When Distributed Micro-Batch Stream Processing Systems Meet GPU

翻译：LMStream: 当分配的微批量流处理系统满足 GPU 时

Suyeon Lee,Sungyong Park

from arxiv, 11 pages

This paper presents LMStream, which ensures bounded latency while maximizing the throughput on the GPU-enabled micro-batch streaming systems. The main ideas behind LMStream's design can be summarized as two novel mechanisms: (1) dynamic batching and (2) dynamic operation-level query planning. By controlling the micro-batch size, LMStream significantly reduces the latency of individual dataset because it does not perform unconditional buffering only for improving GPU utilization. LMStream bounds the latency to an optimal value according to the characteristics of the window operation used in the streaming application. Dynamic mapping between a query to an execution device based on the data size and dynamic device preference improves both the throughput and latency as much as possible. In addition, LMStream proposes a low-overhead online cost model parameter optimization method without interrupting the real-time stream processing. We implemented LMStream on Apache Spark, which supports micro-batch stream processing. Compared to the previous throughput-oriented method, LMStream showed an average latency improvement up to a maximum of 70.7%, while improving average throughput up to 1.74x.

翻译：本文介绍了LMStream, 它确保了连接的延缓度, 同时最大限度地将GPU驱动的微批流系统的吞吐量最大化。 LMStream设计中的主要想法可以归纳为两个新型机制:(1) 动态批发和(2) 动态操作级查询规划。通过控制微批量尺寸, LMStream 大大降低了个人数据集的延缓度, 因为它不只为改善GPU的利用率而执行无条件缓冲。 LMStream 将延缓度与根据流应用中所用窗口操作特性的最佳值捆绑在一起。根据数据大小和动态设备偏好, 对执行装置的查询之间的动态绘图会尽可能改善吞吐量和延度。此外, LMStream 提议在不中断实时流处理的情况下采用低端在线成本模型参数优化方法。我们在阿帕奇 Spark上实施了LMStream, 用于支持微批量流处理。与先前的吞吐法相比, LMStream 显示平均递增率提高至最高值。

0

相关内容

Stream Processing

Stream Processing

《深度学习500问》一份超全深度学习资料，面试必备！

《深度学习500问》一份超全深度学习资料，面试必备！

专知会员服务

166+阅读 · 2022年1月9日

视频处理与压缩技术

专知会员服务

15+阅读 · 2021年3月26日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

专知会员服务

17+阅读 · 2020年4月2日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

【伯克利】再思考 Transformer中的Batch Normalization

【伯克利】再思考 Transformer中的Batch Normalization

专知会员服务

41+阅读 · 2020年3月21日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

95 FPS！超快速3D目标检测网络开源了！SFA3D：基于LiDAR的实时、准确的3D目标检测模型

95 FPS！超快速3D目标检测网络开源了！SFA3D：基于LiDAR的实时、准确的3D目标检测模型

CVer

4+阅读 · 2020年11月14日

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

CVer

21+阅读 · 2020年6月20日

GPU 显存不足怎么办？

GPU 显存不足怎么办？

AINLP

13+阅读 · 2019年8月16日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

已删除

将门创投

5+阅读 · 2018年2月28日

High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs

High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs

Arxiv

0+阅读 · 2022年1月11日

Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Graphs

Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Graphs

Arxiv

0+阅读 · 2022年1月10日

Voronoi diagrams for the distributed sensor network system data processing

Arxiv

0+阅读 · 2022年1月9日

Per-link Parallel and Distributed Hybrid Beamforming for Multi-Cell Massive MIMO Millimeter Wave Full Duplex

Arxiv

0+阅读 · 2022年1月9日

Streaming enumeration on nested documents

Arxiv

0+阅读 · 2022年1月7日

SMSE: A Serverless Platform for Multimedia Cloud Systems

Arxiv

0+阅读 · 2022年1月6日

Hybrid Beamforming and Combining for Millimeter Wave Full Duplex Massive MIMO Interference Channel

Arxiv

0+阅读 · 2022年1月4日

A Survey on the Evolution of Stream Processing Systems

A Survey on the Evolution of Stream Processing Systems

Arxiv

9+阅读 · 2020年8月3日

Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering

Arxiv

7+阅读 · 2018年6月12日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

VIP会员

文章信息

相关主题

Stream Processing

Processing（编程语言）

相关VIP内容

《深度学习500问》一份超全深度学习资料，面试必备！

《深度学习500问》一份超全深度学习资料，面试必备！

专知会员服务

166+阅读 · 2022年1月9日

视频处理与压缩技术

专知会员服务

15+阅读 · 2021年3月26日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

为什么批处理规范会导致梯度爆炸，Why Batch Norm Causes Exploding Gradients

专知会员服务

17+阅读 · 2020年4月2日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

【伯克利】再思考 Transformer中的Batch Normalization

【伯克利】再思考 Transformer中的Batch Normalization

专知会员服务

41+阅读 · 2020年3月21日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机战争时代的战时法：大国竞争中的区分原则、相称性原则与行动建议》最新75页

《构建强健军事力量的设计挑战：提升海军兵力支持系统效能的多分辨率建模方法》69页

正视无人机心理战：恐惧效应与战略反思

《精确反蜂群防御系统：三维运动探测与定向空爆拦截技术融合》最新24页

相关资讯

95 FPS！超快速3D目标检测网络开源了！SFA3D：基于LiDAR的实时、准确的3D目标检测模型

95 FPS！超快速3D目标检测网络开源了！SFA3D：基于LiDAR的实时、准确的3D目标检测模型

CVer

4+阅读 · 2020年11月14日

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

PyTorch Parallel Training（单机多卡并行、混合精度、同步BN训练指南文档）

CVer

21+阅读 · 2020年6月20日

GPU 显存不足怎么办？

GPU 显存不足怎么办？

AINLP

13+阅读 · 2019年8月16日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

已删除

将门创投

5+阅读 · 2018年2月28日

相关论文

High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs

High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs

Arxiv

0+阅读 · 2022年1月11日

Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Graphs

Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Graphs

Arxiv

0+阅读 · 2022年1月10日

Voronoi diagrams for the distributed sensor network system data processing

Arxiv

0+阅读 · 2022年1月9日

Per-link Parallel and Distributed Hybrid Beamforming for Multi-Cell Massive MIMO Millimeter Wave Full Duplex

Arxiv

0+阅读 · 2022年1月9日

Streaming enumeration on nested documents

Arxiv

0+阅读 · 2022年1月7日

SMSE: A Serverless Platform for Multimedia Cloud Systems

Arxiv

0+阅读 · 2022年1月6日

Hybrid Beamforming and Combining for Millimeter Wave Full Duplex Massive MIMO Interference Channel

Arxiv

0+阅读 · 2022年1月4日

A Survey on the Evolution of Stream Processing Systems

A Survey on the Evolution of Stream Processing Systems

Arxiv

9+阅读 · 2020年8月3日

Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering

Arxiv

7+阅读 · 2018年6月12日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

微信扫码咨询专知VIP会员