HLSDataset:利用高级合成技术为 ML 辅助的 FPGA 设计提供开放源数据集 (HLSDataset: Open-Source Dataset for ML-Assisted FPGA Design using High Level Synthesis)

Machine Learning (ML) has been widely adopted in design exploration using high level synthesis (HLS) to give a better and faster performance, and resource and power estimation at very early stages for FPGA-based design. To perform prediction accurately, high-quality and large-volume datasets are required for training ML models.This paper presents a dataset for ML-assisted FPGA design using HLS, called HLSDataset. The dataset is generated from widely used HLS C benchmarks including Polybench, Machsuite, CHStone and Rossetta. The Verilog samples are generated with a variety of directives including loop unroll, loop pipeline and array partition to make sure optimized and realistic designs are covered. The total number of generated Verilog samples is nearly 9,000 per FPGA type. To demonstrate the effectiveness of our dataset, we undertake case studies to perform power estimation and resource usage estimation with ML models trained with our dataset. All the codes and dataset are public at the github repo.We believe that HLSDataset can save valuable time for researchers by avoiding the tedious process of running tools, scripting and parsing files to generate the dataset, and enable them to spend more time where it counts, that is, in training ML models.

翻译：使用高水平合成(HLS)进行设计探索时,广泛采用机器学习(ML),以提供更好、更快的性能,并在以FPGA为基础的设计最初阶段进行资源和动力估算。为了准确进行预测,培训ML模型需要高质量的大容量数据集。本文展示了使用HLS(称为HLSDataset)进行ML辅助FGA设计的一个数据集。数据集来自广泛使用的HLS C基准,包括Polybench、Machssuite、CHStone和Rosseta。Verilog样本生成时,有各种各样的指令,包括循环无线、循环管道管道和阵列分配,以确保优化和现实的设计得到覆盖。生成的Verilog样本总数几乎每FPGA类型9,000个。为了展示我们数据集的有效性,我们进行了案例研究,以便用我们训练的MLSD模型进行电力估计和资源使用估计。所有代码和数据集都是在Github repo上公开的。我们相信HLSDataset能够为研究人员节省宝贵的时间,通过避免对时间的模型进行更精细化和计算,从而能够制作工具。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日