Recently, deep learning models have been widely applied in program understanding tasks, and these models achieve state-of-the-art results on many benchmark datasets. A major challenge of deep learning for program understanding is that the effectiveness of these approaches depends on the quality of their datasets, and these datasets often contain noisy data samples. A typical kind of noise in program understanding datasets is label noises, which means that the target outputs for some inputs are mislabeled. Label noises may have a negative impact on the performance of deep learning models, so researchers have proposed various approaches to alleviate the impact of noisy labels, and formed a new research topic: noisy label learning (NLL). In this paper, we conduct an empirical study on the effectiveness of noisy label learning on deep learning for program understanding datasets. We evaluate various noisy label learning approaches and deep learning models on two tasks: program classification and code summarization. From the evaluation results, we find that the impact of label noise and NLL approaches on small deep learning models and large pre-trained models are different: small models are prone to label noises in program classification and NLL approaches can improve their robustness, while large pre-trained models are robust against label noises and NLL does not significantly improve their performances. On the other hand, NLL approaches have shown satisfying results in identifying noisy labeled samples for both tasks, indicating that these techniques can benefit researchers in building high-quality program understanding datasets.
翻译:暂无翻译