Reproducibility is an increasing concern in Artificial Intelligence (AI), particularly in the area of Deep Learning (DL). Being able to reproduce DL models is crucial for AI-based systems, as it is closely tied to various tasks like training, testing, debugging, and auditing. However, DL models are challenging to be reproduced due to issues like randomness in the software (e.g., DL algorithms) and non-determinism in the hardware (e.g., GPU). There are various practices to mitigate some of the aforementioned issues. However, many of them are either too intrusive or can only work for a specific usage context. In this paper, we propose a systematic approach to training reproducible DL models. Our approach includes three main parts: (1) a set of general criteria to thoroughly evaluate the reproducibility of DL models for two different domains, (2) a unified framework which leverages a record-and-replay technique to mitigate software-related randomness and a profile-and-patch technique to control hardware-related non-determinism, and (3) a reproducibility guideline which explains the rationales and the mitigation strategies on conducting a reproducible training process for DL models. Case study results show our approach can successfully reproduce six open source and one commercial DL models.
翻译:在人工智能(AI)中,特别是在深学习(DL)领域,复制DL模型越来越引起人们的关注。能够复制DL模型对于AI系统至关重要,因为它与培训、测试、调试和审计等各种任务密切相关。然而,DL模型具有挑战性,因为软件随机性(例如DL算法)和硬件非确定性(例如GPU)等问题,而复制DL模型具有挑战性。有多种做法可以缓解上述一些问题,但许多做法要么过于侵入性,要么只能用于特定的使用环境。在本文件中,我们提出了培训可复制DL模型的系统方法。我们的方法包括三个主要部分:(1) 一套全面评价DL模型在两个不同领域的可复制性的一般标准;(2) 一个统一框架,利用记录和再应用技术来减少与软件有关的随机性,以及一种配置和批量技术来控制与硬件有关的非定义性,以及(3) 一个可复制性指南,用以解释六种理由和可复制的案例研究。