Commit Classification(CC) is an important task in software maintenance since it helps software developers classify code changes into different types according to their nature and purpose. This allows them to better understand how their development efforts are progressing, identify areas where they need improvement. However, existing methods are all discriminative models, usually with complex architectures that require additional output layers to produce class label probabilities. Moreover, they require a large amount of labeled data for fine-tuning, and it is difficult to learn effective classification boundaries in the case of limited labeled data. To solve above problems, we propose a generative framework that Incorporating prompt-tuning for commit classification with prior knowledge (IPCK) https://github.com/AppleMax1992/IPCK, which simplifies the model structure and learns features across different tasks. It can still reach the SOTA performance with only limited samples. Firstly, we proposed a generative framework based on T5. This encoder-decoder construction method unifies different CC task into a text2text problem, which simplifies the structure of the model by not requiring an extra output layer. Second, instead of fine-tuning, we design an prompt-tuning solution which can be adopted in few-shot scenarios with only limit samples. Furthermore, we incorporate prior knowledge via an external knowledge graph to map the probabilities of words into the final labels in the speech machine step to improve performance in few-shot scenarios. Extensive experiments on two open available datasets show that our framework can solve the CC problem simply but effectively in few-shot and zeroshot scenarios, while improving the adaptability of the model without requiring a large amount of training samples for fine-tuning.
翻译:暂无翻译