事后见的惠益:在分配系统中追踪边际案例 (The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems) - 专知论文

会员服务 ·

0

迹 · ReQuEST · CASES · Integration · 结点 ·

2022 年 2 月 11 日

The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems

翻译：事后见的惠益:在分配系统中追踪边际案例

Lei Zhang,Vaastav Anand,Zhiqiang Xie,Ymir Vigfusson,Jonathan Mace

Today's distributed tracing frameworks only trace a small fraction of all requests. For application developers troubleshooting rare edge-cases, the tracing framework is unlikely to capture a relevant trace at all, because it cannot know which requests will be problematic until after-the-fact. Application developers thus heavily depend on luck. In this paper, we remove the dependence on luck for any edge-case where symptoms can be programmatically detected, such as high tail latency, errors, and bottlenecked queues. We propose a lightweight and always-on distributed tracing system, Hindsight, where each constituent node acts analogously to a car dash-cam that, upon detecting a sudden jolt in momentum, persists the last hour of footage. Hindsight implements a retroactive sampling abstraction: when the symptoms of a problem are detected, Hindsight retrieves and persists coherent trace data from all relevant nodes that serviced the request. Developers using Hindsight receive the exact edge-case traces they desire; by comparison existing sampling-based tracing systems depend wholly on serendipity. Our experimental evaluation shows that Hindsight successfully collects edge-case symptomatic requests in real-world use cases. Hindsight adds only nanosecond-level overhead to generate trace data, can handle GB/s of data per node, transparently integrates with existing distributed tracing systems, and persists full, detailed traces when an edge-case problem is detected.

翻译：今天分布式追踪框架只追溯到所有请求中的一小部分。对于排除稀有边框的应用程序开发者来说,追踪框架不可能完全捕捉到相关的线索, 因为它在事后无法知道哪些请求会有问题。应用程序开发者因此在很大程度上依赖于运气。在本文中, 我们不再依赖任何边框的运气, 在边框中, 可以通过程序检测出症状, 例如高尾悬浮、错误和瓶颈的队列。我们建议一个轻量且总是在边框分布式追踪系统 Hindsight, 每一个组件的节点都类似于一个汽车破碎摄像头, 因为它在发现突然的摇摆动后, 无法了解最后一小时的画面。应用程序视图将执行追溯性抽样抽取: 当发现问题的症状时, Hindsight 检索并持续从所有符合请求的所有相关节点( 如高尾悬浮、错误和瓶颈队列队列队列) 开发者会收到他们想要的准确的边框跟踪; 比较现有的基于取样的追踪系统, 完全取决于时间。我们的实验性评估显示, 光谱- 光谱- 正在成功地收集真实的轨迹定的轨道, 每一个轨道处理。

0

相关内容

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

社交网络级联数据流异常检测模型研究

国家自然科学基金

4+阅读 · 2015年12月31日

工业级严格实时无线传感网在云南智能电网的关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

高Q值闭环电容式MEMS加速度计高精度数字化读出技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

切换关联时滞系统的非脆弱分散控制

国家自然科学基金

0+阅读 · 2013年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

抗体阻断效应在恒河猴感染日本血吸虫自愈中的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

实时多模粒子PHD滤波器算法与硬件实现研究

国家自然科学基金

0+阅读 · 2011年12月31日

Unscented卡尔曼滤波算法及其在通信中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

车路协调下自治汽车列队建模与控制研究

国家自然科学基金

1+阅读 · 2008年12月31日

Auto-Icon+: An Automated End-to-End Code Generation Tool for Icon Designs in UI Development

Arxiv

0+阅读 · 2022年4月19日

Investigating Cargo Loss in Logistics Systems using Low-Cost Impact Sensors

Arxiv

0+阅读 · 2022年4月19日

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Arxiv

2+阅读 · 2022年4月18日

Distributed MST Computation in the Sleeping Model: Awake-Optimal Algorithms and Lower Bounds

Distributed MST Computation in the Sleeping Model: Awake-Optimal Algorithms and Lower Bounds

Arxiv

0+阅读 · 2022年4月18日

Quantized Federated Learning under Transmission Delay and Outage Constraints

Arxiv

0+阅读 · 2022年4月17日

A Distributed and Elastic Aggregation Service for Scalable Federated Learning Systems

Arxiv

0+阅读 · 2022年4月16日

A Catalogue of Concerns for Specifying Machine Learning-Enabled Systems

Arxiv

0+阅读 · 2022年4月15日

Server Free Wireless Federated Learning: Architecture, Algorithm, and Analysis

Arxiv

0+阅读 · 2022年4月15日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

VIP会员

文章信息

相关主题

相关VIP内容

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Auto-Icon+: An Automated End-to-End Code Generation Tool for Icon Designs in UI Development

Arxiv

0+阅读 · 2022年4月19日

Investigating Cargo Loss in Logistics Systems using Low-Cost Impact Sensors

Arxiv

0+阅读 · 2022年4月19日

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Arxiv

2+阅读 · 2022年4月18日

Distributed MST Computation in the Sleeping Model: Awake-Optimal Algorithms and Lower Bounds

Distributed MST Computation in the Sleeping Model: Awake-Optimal Algorithms and Lower Bounds

Arxiv

0+阅读 · 2022年4月18日

Quantized Federated Learning under Transmission Delay and Outage Constraints

Arxiv

0+阅读 · 2022年4月17日

A Distributed and Elastic Aggregation Service for Scalable Federated Learning Systems

Arxiv

0+阅读 · 2022年4月16日

A Catalogue of Concerns for Specifying Machine Learning-Enabled Systems

Arxiv

0+阅读 · 2022年4月15日

Server Free Wireless Federated Learning: Architecture, Algorithm, and Analysis

Arxiv

0+阅读 · 2022年4月15日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

相关基金

社交网络级联数据流异常检测模型研究

国家自然科学基金

4+阅读 · 2015年12月31日

工业级严格实时无线传感网在云南智能电网的关键技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

高Q值闭环电容式MEMS加速度计高精度数字化读出技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

切换关联时滞系统的非脆弱分散控制

国家自然科学基金

0+阅读 · 2013年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

抗体阻断效应在恒河猴感染日本血吸虫自愈中的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

实时多模粒子PHD滤波器算法与硬件实现研究

国家自然科学基金

0+阅读 · 2011年12月31日

Unscented卡尔曼滤波算法及其在通信中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

车路协调下自治汽车列队建模与控制研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员