In NLP, reusing pre-trained models instead of training from scratch has gained popularity; however, NLP models are mostly black boxes, very large, and often require significant resources. To ease, models trained with large corpora are made available, and developers reuse them for different problems. In contrast, developers mostly build their models from scratch for traditional DL-related problems. By doing so, they have control over the choice of algorithms, data processing, model structure, tuning hyperparameters, etc. Whereas in NLP, due to the reuse of the pre-trained models, NLP developers are limited to little to no control over such design decisions. They either apply tuning or transfer learning on pre-trained models to meet their requirements. Also, NLP models and their corresponding datasets are significantly larger than the traditional DL models and require heavy computation. Such reasons often lead to bugs in the system while reusing the pre-trained models. While bugs in traditional DL software have been intensively studied, the nature of extensive reuse and black-box structure motivates us to understand the different types of bugs that occur while reusing NLP models? What are the root causes of those bugs? How do these bugs affect the system? To answer these questions, We studied the bugs reported while reusing the 11 popular NLP models. We mined 9,214 issues from GitHub repositories and identified 984 bugs. We created a taxonomy with bug types, root causes, and impacts. Our observations led to several findings, including limited access to model internals resulting in a lack of robustness, lack of input validation leading to the propagation of algorithmic and data bias, and high-resource consumption causing more crashes, etc. Our observations suggest several bug patterns, which would greatly facilitate further efforts in reducing bugs in pre-trained models and code reuse.
翻译:在NLP中,使用预先培训的模型而不是从零开始的培训越来越受欢迎;然而,在NLP中,NLP模型大多是黑盒,非常大,往往需要大量资源。为了方便,可以提供经过大型公司体培训的模型,而开发者则根据不同的问题再利用这些模型。相反,开发者大多是为了传统的DLL问题而从零开始建立模型。这样,他们就能控制对算法的选择、数据处理、模型结构、调控超参数等等。而在NLP中,由于经过事先培训的模型的重新利用,NLP模型大多是黑盒,但NLP的开发者们很少能够控制这些设计决定。他们要么对预先培训的模型进行调试,要么是传授,要么是传授,要么是传授,要么是传授,要么是传授,要么是传授,要么是传授。