Pre-trained giant code models (PCMs) start coming into the developers' daily practices. Understanding what types of and how much software knowledge is packed into PCMs is the foundation for incorporating PCMs into software engineering (SE) tasks and fully releasing their potential. In this work, we conduct the first systematic study on the SE factual knowledge in the state-of-the-art PCM CoPilot, focusing on APIs' Fully Qualified Names (FQNs), the fundamental knowledge for effective code analysis, search and reuse. Driven by FQNs' data distribution properties, we design a novel lightweight in-context learning on Copilot for FQN inference, which does not require code compilation as traditional methods or gradient update by recent FQN prompt-tuning. We systematically experiment with five in-context-learning design factors to identify the best in-context learning configuration that developers can adopt in practice. With this best configuration, we investigate the effects of amount of example prompts and FQN data properties on Copilot's FQN inference capability. Our results confirm that Copilot stores diverse FQN knowledge and can be applied for the FQN inference due to its high inference accuracy and non-reliance on code analysis. Based on our experience interacting with Copilot, we discuss various opportunities to improve human-CoPilot interaction in the FQN inference task.
翻译:了解哪些类型的软件知识以及有多少软件知识被包装在PCMM中,这是将PCMM纳入软件工程(SE)任务并充分释放其潜力的基础。在这项工作中,我们首次系统研究在最先进的PCM CoPilot 中SE的事实知识,重点是APIs的完全合格名称(FQNs),有效代码分析、搜索和再利用的基本知识。在FQNs数据分发属性的驱动下,我们设计了一个新的FQN推断的Copilot的轻量版中文本学习,这不需要将代码汇编作为传统方法或梯度更新,而不需要最近的FQN迅速调整。我们系统地试验了五个在文本中学习的设计因素,以确定开发者在实践中可以采用的最佳的文中学习配置。我们用这种最佳配置,我们调查了实例提示的数量和FQN数据属性对CQN的Centerence能力的影响。我们的结果证实,Covial储存了多种FQN的精确性和高精确性,我们可以在FQ数据库中进行适当的互动分析。