HDF5 用于高能物理数据分析的平行HDF5数据集汇总案例研究 (A Case Study on Parallel HDF5 Dataset Concatenation for High Energy Physics Data Analysis) - 专知论文

会员服务 ·

0

CASE · 缩放 · 连结 · 可约的 · Better ·

2022 年 5 月 2 日

A Case Study on Parallel HDF5 Dataset Concatenation for High Energy Physics Data Analysis

翻译：HDF5 用于高能物理数据分析的平行HDF5数据集汇总案例研究

Sunwoo Lee,Kai-yuan Hou,Kewei Wang,Saba Sehrish,Marc Paterno,James Kowalkowski,Quincey Koziol,Robert Ross,Ankit Agrawal,Alok Choudhary,Wei-keng Liao

In High Energy Physics (HEP), experimentalists generate large volumes of data that, when analyzed, helps us better understand the fundamental particles and their interactions. This data is often captured in many files of small size, creating a data management challenge for scientists. In order to better facilitate data management, transfer, and analysis on large scale platforms, it is advantageous to aggregate data further into a smaller number of larger files. However, this translation process can consume significant time and resources, and if performed incorrectly the resulting aggregated files can be inefficient for highly parallel access during analysis on large scale platforms. In this paper, we present our case study on parallel I/O strategies and HDF5 features for reducing data aggregation time, making effective use of compression, and ensuring efficient access to the resulting data during analysis at scale. We focus on NOvA detector data in this case study, a large-scale HEP experiment generating many terabytes of data. The lessons learned from our case study inform the handling of similar datasets, thus expanding community knowledge related to this common data management task.

翻译：在高能物理(HEP)中,实验学家生成了大量数据,这些数据在分析时有助于我们更好地了解基本粒子及其相互作用。这些数据往往被许多小型档案所收集,给科学家带来了数据管理挑战。为了更好地便利数据管理、转移和分析大型平台,将数据进一步汇集到数量较少的大型文档中是有好处的。然而,这一翻译过程可能耗费大量的时间和资源,如果进行错误的翻译,在大型平台的分析中,所产生的综合文件在高度平行访问方面可能效率低下。在本文中,我们介绍了关于平行I/O战略和HDF5特性的案例研究,以缩短数据汇总时间,有效利用压缩,并确保在大规模分析期间有效获取由此产生的数据。我们在案例研究中侧重于NOVA探测器数据,这是一个大规模HEP实验,产生许多数据兆字节。从我们的案例研究中吸取的经验教训为类似数据集的处理提供了信息,从而扩大了与这一共同数据管理任务有关的社区知识。

0

相关内容

CASE

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

431+阅读 · 2021年1月11日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ERK5促进黑色素瘤对威罗菲尼耐药的作用和机制

国家自然科学基金

0+阅读 · 2015年12月31日

Bacillus megaterium Q3降解二氯喹啉酸分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

分布式永磁风力发电系统非线性负载无功补偿与谐波抑制

国家自然科学基金

0+阅读 · 2013年12月31日

Keggin型多酸基磁性材料的自旋传输机理理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

强磁场磁控溅射的基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

B位调控钙钛矿钴氧化物的CMR效应及自旋态研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

Gradient-Enhanced Physics-Informed Neural Networks for Power Systems Operational Support

Arxiv

0+阅读 · 2022年6月21日

Event-Case Correlation for Process Mining using Probabilistic Optimization

Arxiv

0+阅读 · 2022年6月20日

A Critical Review of Communications in Multi-Robot Systems

Arxiv

0+阅读 · 2022年6月19日

Service Discovery in Social Internet of Things using Graph Neural Networks

Arxiv

0+阅读 · 2022年6月18日

On Efficient Real-Time Semantic Segmentation: A Survey

Arxiv

0+阅读 · 2022年6月17日

Capturing Actionable Dynamics with Structured Latent Ordinary Differential Equations

Arxiv

0+阅读 · 2022年6月17日

Supernet Training for Federated Image Classification under System Heterogeneity

Arxiv

0+阅读 · 2022年6月17日

Artificial Intelligence and Medicine: A literature review

Arxiv

31+阅读 · 2022年5月5日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Controllable Multi-Interest Framework for Recommendation

Arxiv

18+阅读 · 2020年8月3日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

431+阅读 · 2021年1月11日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】基础模型训练中网络规模数据的负责任与高效使用

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

人工智能时代背景下的未来海战

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Gradient-Enhanced Physics-Informed Neural Networks for Power Systems Operational Support

Arxiv

0+阅读 · 2022年6月21日

Event-Case Correlation for Process Mining using Probabilistic Optimization

Arxiv

0+阅读 · 2022年6月20日

A Critical Review of Communications in Multi-Robot Systems

Arxiv

0+阅读 · 2022年6月19日

Service Discovery in Social Internet of Things using Graph Neural Networks

Arxiv

0+阅读 · 2022年6月18日

On Efficient Real-Time Semantic Segmentation: A Survey

Arxiv

0+阅读 · 2022年6月17日

Capturing Actionable Dynamics with Structured Latent Ordinary Differential Equations

Arxiv

0+阅读 · 2022年6月17日

Supernet Training for Federated Image Classification under System Heterogeneity

Arxiv

0+阅读 · 2022年6月17日

Artificial Intelligence and Medicine: A literature review

Arxiv

31+阅读 · 2022年5月5日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Controllable Multi-Interest Framework for Recommendation

Arxiv

18+阅读 · 2020年8月3日

相关基金

ERK5促进黑色素瘤对威罗菲尼耐药的作用和机制

国家自然科学基金

0+阅读 · 2015年12月31日

Bacillus megaterium Q3降解二氯喹啉酸分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

分布式永磁风力发电系统非线性负载无功补偿与谐波抑制

国家自然科学基金

0+阅读 · 2013年12月31日

Keggin型多酸基磁性材料的自旋传输机理理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

强磁场磁控溅射的基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

B位调控钙钛矿钴氧化物的CMR效应及自旋态研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员