As the usage of Artificial Intelligence (AI) on resource-intensive and safety-critical tasks increases, a variety of Machine Learning (ML) compilers have been developed, enabling compatibility of Deep Neural Networks (DNNs) with a variety of hardware acceleration devices. However, given that DNNs are widely utilized for challenging and demanding tasks, the behavior of these compilers must be verified. To this direction, we propose MutateNN, a tool that utilizes elements of both differential and mutation testing in order to examine the robustness of image recognition models when deployed on hardware accelerators with different capabilities, in the presence of faults in their target device code - introduced either by developers, or problems in their compilation process. We focus on the image recognition domain by applying mutation testing to 7 well-established DNN models, introducing 21 mutations of 6 different categories. We deployed our mutants on 4 different hardware acceleration devices of varying capabilities and observed that DNN models presented discrepancies of up to 90.3% in mutants related to conditional operators across devices. We also observed that mutations related to layer modification, arithmetic types and input affected severely the overall model performance (up to 99.8%) or led to model crashes, in a consistent manner across devices.
翻译:暂无翻译