代码大语言模型的神经元引导式解释：何处、为何及如何？ (Neuron-Guided Interpretation of Code LLMs: Where, Why, and How?)

Code language models excel on code intelligence tasks, yet their internal interpretability is underexplored. Existing neuron interpretability techniques from NLP are suboptimal for source code due to programming languages formal, hierarchical, and executable nature. We empirically investigate code LLMs at the neuron level, localizing language-specific neurons (selectively responsive to one language) and concept layers (feed-forward layers encoding language-agnostic code representations). We analyze Llama-3.1-8B and Qwen2.5-Coder-32B on multilingual inputs in C++, Java, Python, Go, and JavaScript, measuring neuron selectivity and layerwise contributions during generation. We find (1) neurons specialized for individual languages alongside a universal subset supporting general-purpose generation; and (2) lower layers mainly encode language-specific syntax, while middle layers capture semantic abstractions shared across languages, emerging as concept layers. We demonstrate utility on three tasks: neuron-guided fine-tuning for code generation, clone detection via concept-layer embeddings, and concept-layer-guided transfer for code summarization, each yielding consistent gains in multilingual settings.

翻译：代码语言模型在代码智能任务上表现出色，但其内部可解释性尚未得到充分探索。由于编程语言具有形式化、层次化和可执行的特性，现有的自然语言处理神经元可解释性技术对源代码并不理想。我们在神经元层面对代码大语言模型进行实证研究，定位了语言特异性神经元（对单一语言选择性响应）和概念层（编码语言无关代码表示的前馈层）。我们在C++、Java、Python、Go和JavaScript的多语言输入上分析了Llama-3.1-8B和Qwen2.5-Coder-32B模型，测量了生成过程中神经元的选择性和各层的贡献。我们发现：（1）存在专门处理个别语言的神经元，同时存在支持通用生成的通用神经元子集；（2）底层主要编码语言特定的语法，而中间层捕获跨语言共享的语义抽象，这些中间层作为概念层出现。我们展示了该方法在三个任务上的实用性：用于代码生成的神经元引导微调、通过概念层嵌入进行克隆检测，以及用于代码摘要的概念层引导迁移，每种方法在多语言设置下均取得了稳定的性能提升。

相关内容