As Deep Neural Networks (DNNs) are rapidly being adopted within large software systems, software developers are increasingly required to design, train, and deploy such models into the systems they develop. Consequently, testing and improving the robustness of these models have received a lot of attention lately. However, relatively little effort has been made to address the difficulties developers experience when designing and training such models: if the evaluation of a model shows poor performance after the initial training, what should the developer change? We survey and evaluate existing state-of-the-art techniques that can be used to repair model performance, using a benchmark of both real-world mistakes developers made while designing DNN models and artificial faulty models generated by mutating the model code. The empirical evaluation shows that random baseline is comparable with or sometimes outperforms existing state-of-the-art techniques. However, for larger and more complicated models, all repair techniques fail to find fixes. Our findings call for further research to develop more sophisticated techniques for Deep Learning repair.
翻译:随着深神经网络(DNNS)在大型软件系统中被迅速采用,软件开发者越来越需要设计、培训和将这类模型应用到他们所开发的系统中。因此,这些模型的稳健性最近受到了很多关注。然而,在设计和培训这些模型时,相对没有努力解决开发者的困难经验:如果对模型的评估显示,在初步培训之后,模型的性能不佳,开发者应该作出什么变化?我们调查并评估现有的可用于修复模型性能的最新技术,我们利用设计 DNNM 模型时产生的真实世界错误开发者的基准和变异模型产生的人为错误模型。经验评估表明,随机基线与现有最新技术相似,有时甚至超出现有最新技术。然而,对于更大和更加复杂的模型,所有修复技术都找不到修补。我们的调查结果要求进一步研究开发更先进的深学习修复技术。