Complex narrative contexts often challenge language models' ability to follow instructions, and existing benchmarks fail to capture these difficulties. To address this, we propose Concise-SAE, a training-free framework that improves instruction following by identifying and editing instruction-relevant neurons using only natural language instructions, without requiring labelled data. To thoroughly evaluate our method, we introduce FreeInstruct, a diverse and realistic benchmark of 1,212 examples that highlights the challenges of instruction following in narrative-rich settings. While initially motivated by complex narratives, Concise-SAE demonstrates state-of-the-art instruction adherence across varied tasks without compromising generation quality.
翻译:复杂叙事语境常常挑战语言模型遵循指令的能力,而现有基准测试未能充分捕捉这些困难。为解决此问题,我们提出Concise-SAE——一种无需训练即可提升指令跟随能力的框架,该框架仅使用自然语言指令即可识别并编辑与指令相关的神经元,无需标注数据。为全面评估该方法,我们构建了FreeInstruct基准测试集,包含1,212个具有多样性与现实性的测试样本,突显了叙事密集场景中指令跟随的挑战。虽然最初受复杂叙事场景启发,但Concise-SAE在各类任务中均展现出最先进的指令遵循能力,且不损害生成质量。