ORNL中子散射数据减少工作流程的有效数据管理 (Efficient Data Management in Neutron Scattering Data Reduction Workflows at ORNL) - 专知论文

会员服务 ·

0

数据缩减 · 可约的 · 可辨认的 · Performer · 优化器 ·

2021 年 1 月 5 日

Efficient Data Management in Neutron Scattering Data Reduction Workflows at ORNL

翻译：ORNL中子散射数据减少工作流程的有效数据管理

William F Godoy,Peter F Peterson,Steven E Hahn,Jay J Billings

from arxiv, 7 pages, 4 figures, International Workshop on Big Data Reduction held with 2020 IEEE International Conference on Big Data

Oak Ridge National Laboratory (ORNL) experimental neutron science facilities produce 1.2\,TB a day of raw event-based data that is stored using the standard metadata-rich NeXus schema built on top of the HDF5 file format. Performance of several data reduction workflows is largely determined by the amount of time spent on the loading and processing algorithms in Mantid, an open-source data analysis framework used across several neutron sciences facilities around the world. The present work introduces new data management algorithms to address identified input output (I/O) bottlenecks on Mantid. First, we introduce an in-memory binary-tree metadata index that resemble NeXus data access patterns to provide a scalable search and extraction mechanism. Second, data encapsulation in Mantid algorithms is optimally redesigned to reduce the total compute and memory runtime footprint associated with metadata I/O reconstruction tasks. Results from this work show speed ups in wall-clock time on ORNL data reduction workflows, ranging from 11\% to 30\% depending on the complexity of the targeted instrument-specific data. Nevertheless, we highlight the need for more research to address reduction challenges as experimental data volumes increase.

翻译：Oak Ridge国家实验中子科学实验室(ORNL)实验性中子科学设施(OrnL)每天产生1.2\,TB 原始事件数据,该原始事件数据储存日数以HDF5文件格式之上的富集元元数据NeXus schema为基础。若干数据减少工作流程的绩效主要取决于曼蒂德的载荷和处理算法所花费的时间,曼蒂德是一个开放源数据分析框架,世界各地若干中子科学设施都使用这一框架。目前的工作引入了新的数据管理算法,以解决曼蒂德的确定输入输出(I/O)瓶颈问题。首先,我们引入了类似于NeXus数据访问模式的模拟双轨元数据指数,以提供一个可缩放的搜索和提取机制。第二,对曼蒂德算法中的数据封装进行了最佳的重新设计,以减少与元数据I/O重建任务相关的总计算和记忆运行时间足迹。这项工作的结果显示,ORNL数据减少工作流程在倒时加快速度,从11 ⁇ 至30 ⁇ 不等,这取决于特定仪器数据的复杂性。然而的数据减少量的实验性研究需要进一步减少数据。然而。然而,我们强调需要更多研究。

0

相关内容

数据缩减

多伦多大学最新《机器学习导论》课程，Introduction to Machine Learning

多伦多大学最新《机器学习导论》课程，Introduction to Machine Learning

专知会员服务

25+阅读 · 2020年9月24日

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

专知会员服务

34+阅读 · 2020年8月26日

【2020新书】现代数据仓库，297页pdf，The Modern Data Warehouse in Azure

【2020新书】现代数据仓库，297页pdf，The Modern Data Warehouse in Azure

专知会员服务

58+阅读 · 2020年6月17日

网络流量监测与分析大数据综述，A Survey on Big Data for Network Traffic Monitoring and Analysis

网络流量监测与分析大数据综述，A Survey on Big Data for Network Traffic Monitoring and Analysis

专知会员服务

65+阅读 · 2020年3月5日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

【O'Reilly AI Conference 2019】大规模构建和部署AI应用程序和系统（Building and deploying AI applications and systems at scale），O'Reilly的首席数据科学家Ben Lorica、Computable 联合创始人兼首席执行官Roger Chen

【O'Reilly AI Conference 2019】大规模构建和部署AI应用程序和系统（Building and deploying AI applications and systems at scale），O'Reilly的首席数据科学家Ben Lorica、Computable 联合创始人兼首席执行官Roger Chen

专知会员服务

24+阅读 · 2019年11月5日

Classic Clustering Algorithms to Live By [ 熊辉，罗格斯－新泽西州立大学教授] 2019年中国计算机大会计算机经典算法回顾与展望——机器学习与数据挖掘论坛

Classic Clustering Algorithms to Live By [ 熊辉，罗格斯－新泽西州立大学教授] 2019年中国计算机大会计算机经典算法回顾与展望——机器学习与数据挖掘论坛

专知会员服务

10+阅读 · 2019年10月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

CCF A类 | 顶级会议RTSS 2019诚邀稿件

CCF A类 | 顶级会议RTSS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年4月17日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Efficient Similarity Search in Dynamic Data Streams

Arxiv

0+阅读 · 2021年3月8日

Bottlenecks in Blockchain Consensus Protocols

Arxiv

0+阅读 · 2021年3月7日

Training a First-Order Theorem Prover from Synthetic Data

Training a First-Order Theorem Prover from Synthetic Data

Arxiv

0+阅读 · 2021年3月5日

Reusability in Research Data

Arxiv

0+阅读 · 2021年3月5日

An Optimized H.266/VVC Software Decoder On Mobile Platform

Arxiv

0+阅读 · 2021年3月5日

Performance Impact Analysis of Beam Switching in Millimeter Wave Vehicular Communications

Arxiv

0+阅读 · 2021年3月5日

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

Arxiv

8+阅读 · 2021年2月18日

HCqa: Hybrid and Complex Question Answering on Textual Corpus and Knowledge Graph

Arxiv

3+阅读 · 2019年1月28日

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Arxiv

4+阅读 · 2018年10月11日

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Arxiv

4+阅读 · 2018年7月30日

VIP会员

文章信息

相关主题

相关VIP内容

多伦多大学最新《机器学习导论》课程，Introduction to Machine Learning

多伦多大学最新《机器学习导论》课程，Introduction to Machine Learning

专知会员服务

25+阅读 · 2020年9月24日

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

【2020新书】实战测试自动化，Practical Test Automation，327页pdf

专知会员服务

34+阅读 · 2020年8月26日

【2020新书】现代数据仓库，297页pdf，The Modern Data Warehouse in Azure

【2020新书】现代数据仓库，297页pdf，The Modern Data Warehouse in Azure

专知会员服务

58+阅读 · 2020年6月17日

网络流量监测与分析大数据综述，A Survey on Big Data for Network Traffic Monitoring and Analysis

网络流量监测与分析大数据综述，A Survey on Big Data for Network Traffic Monitoring and Analysis

专知会员服务

65+阅读 · 2020年3月5日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

【O'Reilly AI Conference 2019】大规模构建和部署AI应用程序和系统（Building and deploying AI applications and systems at scale），O'Reilly的首席数据科学家Ben Lorica、Computable 联合创始人兼首席执行官Roger Chen

【O'Reilly AI Conference 2019】大规模构建和部署AI应用程序和系统（Building and deploying AI applications and systems at scale），O'Reilly的首席数据科学家Ben Lorica、Computable 联合创始人兼首席执行官Roger Chen

专知会员服务

24+阅读 · 2019年11月5日

Classic Clustering Algorithms to Live By [ 熊辉，罗格斯－新泽西州立大学教授] 2019年中国计算机大会计算机经典算法回顾与展望——机器学习与数据挖掘论坛

Classic Clustering Algorithms to Live By [ 熊辉，罗格斯－新泽西州立大学教授] 2019年中国计算机大会计算机经典算法回顾与展望——机器学习与数据挖掘论坛

专知会员服务

10+阅读 · 2019年10月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

训练扩散模型其实比你想象的更简单！何恺明团队新作Dispersive Loss：给扩散模型加正则化

【ICML2025】用于可扩展持续强化学习的自组合策略

最新4500字《死亡算法：人工智能如何助推对加沙的大规模杀戮》（附原文）

人工智能行业：2027年AI预测报告

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

CCF A类 | 顶级会议RTSS 2019诚邀稿件

CCF A类 | 顶级会议RTSS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年4月17日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Efficient Similarity Search in Dynamic Data Streams

Arxiv

0+阅读 · 2021年3月8日

Bottlenecks in Blockchain Consensus Protocols

Arxiv

0+阅读 · 2021年3月7日

Training a First-Order Theorem Prover from Synthetic Data

Training a First-Order Theorem Prover from Synthetic Data

Arxiv

0+阅读 · 2021年3月5日

Reusability in Research Data

Arxiv

0+阅读 · 2021年3月5日

An Optimized H.266/VVC Software Decoder On Mobile Platform

Arxiv

0+阅读 · 2021年3月5日

Performance Impact Analysis of Beam Switching in Millimeter Wave Vehicular Communications

Arxiv

0+阅读 · 2021年3月5日

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

Arxiv

8+阅读 · 2021年2月18日

HCqa: Hybrid and Complex Question Answering on Textual Corpus and Knowledge Graph

Arxiv

3+阅读 · 2019年1月28日

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Arxiv

4+阅读 · 2018年10月11日

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Arxiv

4+阅读 · 2018年7月30日

微信扫码咨询专知VIP会员