The FAIR Guiding Principles aim to improve the findability, accessibility, interoperability, and reusability of digital content by making them both human and machine actionable. However, these principles have not yet been broadly adopted in the domain of machine learning-based program analyses and optimizations for High-Performance Computing (HPC). In this paper, we design a methodology to make HPC datasets and machine learning models FAIR after investigating existing FAIRness assessment and improvement techniques. Our methodology includes a comprehensive, quantitative assessment for elected data, followed by concrete, actionable suggestions to improve FAIRness with respect to common issues related to persistent identifiers, rich metadata descriptions, license and provenance information. Moreover, we select a representative training dataset to evaluate our methodology. The experiment shows the methodology can effectively improve the dataset and model's FAIRness from an initial score of 19.1% to the final score of 83.0%.
翻译:“FAIR指导原则”旨在改进数字内容的可查找性、可获取性、互操作性、可操作性和可再使用性,使数字内容既具有人性又具有机器可操作性,然而,这些原则尚未在基于机械学习的方案分析和优化高性能计算(HPC)领域得到广泛采纳。在本文件中,我们设计了一种方法,在调查现有的“FAIR”评估和改良技术之后,使高能计数据集和机学习模型成为高能计数据集和机能学习模型。我们的方法包括对选取的数据进行全面的定量评估,随后提出具体和可操作的建议,以提高与持久性识别特征、丰富的元数据描述、许可和出处信息有关的共同问题的准确性。此外,我们选择了一个具有代表性的培训数据集来评估我们的方法。实验表明,该方法能够有效地改进数据集和模型的快速性,从最初的19.1%到最后的83.0%。