The findable, accessible, interoperable, and reusable (FAIR) data principles have provided a framework for examining, evaluating, and improving how we share data with the aim of facilitating scientific discovery. Efforts have been made to generalize these principles to research software and other digital products. Artificial intelligence (AI) models -- algorithms that have been trained on data rather than explicitly programmed -- are an important target for this because of the ever-increasing pace with which AI is transforming scientific and engineering domains. In this paper, we propose a practical definition of FAIR principles for AI models and create a FAIR AI project template that promotes adherence to these principles. We demonstrate how to implement these principles using a concrete example from experimental high energy physics: a graph neural network for identifying Higgs bosons decaying to bottom quarks. We study the robustness of these FAIR AI models and their portability across hardware architectures and software frameworks, and report new insights on the interpretability of AI predictions by studying the interplay between FAIR datasets and AI models. Enabled by publishing FAIR AI models, these studies pave the way toward reliable and automated AI-driven scientific discovery.
翻译:可发现、可获取、可互操作和可再使用(FAIR)的数据原则为研究、评价和改进我们如何分享数据以促进科学发现提供了一个框架,已经作出努力,将这些原则推广到软件和其他数字产品研究中,人工智能模型 -- -- 即经过数据培训而不是明确编程的算法 -- -- 是这方面的一个重要目标,因为AI正在日益加快改变科学和工程领域。在本文件中,我们提议对AI模型的FAIR原则作出实际定义,并创建了FAIR AI项目模板,以促进遵守这些原则。我们通过公布FAIR AI模型,这些研究为这些原则的实施提供了具体范例:一个用于识别腐蚀到底部夸克的Higgs Bosons的图形神经网络。我们研究了这些FAIR AI模型的稳健性及其在硬件结构和软件框架中的可移动性,并报告了关于AI预测的解释性的新见解,通过研究FAIR数据集与AI模型之间的相互作用。通过公布FAIR AI模型,这些研究为可靠和自动化的AI驱动科学发现铺平了道路。