JASS: 以NVM为基础的系统灵活检查系统 (JASS: A Flexible Checkpointing System for NVM-based Systems) - 专知论文

会员服务 ·

0

可约的 · 周期的 · NVM · Performer · 正则化项 ·

2023 年 1 月 27 日

JASS: A Flexible Checkpointing System for NVM-based Systems

翻译：JASS: 以NVM为基础的系统灵活检查系统

Akshin Singh,Smruti R. Sarangi

from arxiv, 13 pages, 11 figures

NVM-based systems are naturally fit candidates for incorporating periodic checkpointing (or snapshotting). This increases the reliability of the system, makes it more immune to power failures, and reduces wasted work in especially an HPC setup. The traditional line of thinking is to design a system that is conceptually similar to transactional memory, where we log updates all the time, and minimize the wasted work or alternatively the MTTR (mean time to recovery). Such ``instant recovery'' systems allow the system to recover from a point that is quite close to the point of failure. The penalty that we pay is the prohibitive number of additional writes to the NVM. We propose a paradigmatically different approach in this paper, where we argue that in most practical settings such as regular HPC workloads or neural network training, there is no need for such instant recovery. This means that we can afford to lose some work, take periodic software-initiated checkpoints and still meet the goals of the application. The key benefit of our scheme is that we reduce write amplification substantially; this extends the life of NVMs by roughly the same factor. We go a step further and design an adaptive system that can minimize the WA given a target checkpoint latency, and show that our control algorithm almost always performs near-optimally. Our scheme reduces the WA by 2.3-96\% as compared to the nearest competing work.

翻译：以NVM为基础的系统自然是纳入定期检查站(或快照)的合适人选。这提高了系统的可靠性,使其更不受电力故障的影响,减少了浪费的工作,特别是在HPC的设置中。传统的思路是设计一个在概念上与交易记忆相似的系统,我们经常在其中进行更新,尽量减少浪费的工作,或以MTTR(恢复的时间)替代。这种“即时恢复”系统使系统能够从非常接近故障点的地方恢复过来。我们支付的罚款是给NVM额外写信的令人望而却步的数。我们在本文件中提出了一种范式不同的办法,我们提出在诸如HPC常规工作量或神经网络培训等大多数实际情况下,我们不需要立即进行这种恢复。这意味着我们有能力失去一些工作,采取定期软件启动的检查站,并且仍然达到应用的目标。我们计划的主要好处是大幅度减少写作的重复;我们支付的罚款是给NPMS的寿命以大致相同的因素扩大。我们在本文中提出一种典型的不同的方法,即我们说,在诸如HPC常规工作量或神经网络培训等最实际情况下,我们更进一步地设计一个自我调整的系统。

0

相关内容

可约的

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

基于混合约束正则化的电阻抗成像反演研究

国家自然科学基金

0+阅读 · 2015年12月31日

随机扰动下气动弹性系统失稳机理的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Massive MIMO 系统中接收端低复杂度检测技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

集成电路45nm ESD全芯片解决方案和22nm/20nm FinFET ESD基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

肾素轴抑制及其信号传导途径对房颤心房重构的影响

国家自然科学基金

0+阅读 · 2012年12月31日

宿主蛋白Rab家族在IFITMs抑制病毒复制中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的稀疏阵列MIMO-SAR成像及动目标检测

国家自然科学基金

0+阅读 · 2012年12月31日

用B细胞表位分析技术研究甲型流感NA抗原变异

国家自然科学基金

0+阅读 · 2009年12月31日

hTERT调控相关miRNA的鉴定及功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于HHT的超光谱图像高精度分类算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Structured Optimization-Based Model Order Reduction for Parametric Systems

Arxiv

0+阅读 · 2023年3月20日

Recognizing Complex Gestures on Minimalistic Knitted Sensors: Toward Real-World Interactive Systems

Arxiv

0+阅读 · 2023年3月18日

Active Learning for Event Extraction with Memory-based Loss Prediction Model

Arxiv

0+阅读 · 2023年3月18日

Zero-shot Transferable and Persistently Feasible Safe Control for High Dimensional Systems by Consistent Abstraction

Arxiv

0+阅读 · 2023年3月17日

VPU-EM: An Event-based Modeling Framework to Evaluate NPU Performance and Power Efficiency at Scale

Arxiv

0+阅读 · 2023年3月17日

Perspective Fields for Single Image Camera Calibration

Arxiv

0+阅读 · 2023年3月16日

Learning Minimally-Violating Continuous Control for Infeasible Linear Temporal Logic Specifications

Arxiv

0+阅读 · 2023年3月16日

Orthogonal and Non-Orthogonal Multiple Access for Intelligent Reflection Surface in 6G Systems

Arxiv

0+阅读 · 2023年3月15日

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Arxiv

36+阅读 · 2022年4月25日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

乌克兰太空研究（2022-2024年） | 176页

新型军用战斗机无人机（MFUAV’s）| 2025最新80页

国防领域人工智能走向何方？

无人机对士兵的心理影响

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Structured Optimization-Based Model Order Reduction for Parametric Systems

Arxiv

0+阅读 · 2023年3月20日

Recognizing Complex Gestures on Minimalistic Knitted Sensors: Toward Real-World Interactive Systems

Arxiv

0+阅读 · 2023年3月18日

Active Learning for Event Extraction with Memory-based Loss Prediction Model

Arxiv

0+阅读 · 2023年3月18日

Zero-shot Transferable and Persistently Feasible Safe Control for High Dimensional Systems by Consistent Abstraction

Arxiv

0+阅读 · 2023年3月17日

VPU-EM: An Event-based Modeling Framework to Evaluate NPU Performance and Power Efficiency at Scale

Arxiv

0+阅读 · 2023年3月17日

Perspective Fields for Single Image Camera Calibration

Arxiv

0+阅读 · 2023年3月16日

Learning Minimally-Violating Continuous Control for Infeasible Linear Temporal Logic Specifications

Arxiv

0+阅读 · 2023年3月16日

Orthogonal and Non-Orthogonal Multiple Access for Intelligent Reflection Surface in 6G Systems

Arxiv

0+阅读 · 2023年3月15日

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Arxiv

36+阅读 · 2022年4月25日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

相关基金

基于混合约束正则化的电阻抗成像反演研究

国家自然科学基金

0+阅读 · 2015年12月31日

随机扰动下气动弹性系统失稳机理的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Massive MIMO 系统中接收端低复杂度检测技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

集成电路45nm ESD全芯片解决方案和22nm/20nm FinFET ESD基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

肾素轴抑制及其信号传导途径对房颤心房重构的影响

国家自然科学基金

0+阅读 · 2012年12月31日

宿主蛋白Rab家族在IFITMs抑制病毒复制中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的稀疏阵列MIMO-SAR成像及动目标检测

国家自然科学基金

0+阅读 · 2012年12月31日

用B细胞表位分析技术研究甲型流感NA抗原变异

国家自然科学基金

0+阅读 · 2009年12月31日

hTERT调控相关miRNA的鉴定及功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于HHT的超光谱图像高精度分类算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员