In language modeling, neologisms are new tokens trained to represent a concept not already included in a given model's vocabulary. Neologisms can be used to encourage specific behavior in models, for example by appending prompts with "Give me a neologism answer." Behavioral steering can also be achieved through fine-tuning, albeit with more compute and less flexibility: learning a neologism only trains d parameters and allows the user to still access the model's default behavior. We compare the performance of neologism learning against low-rank adaptation (LoRA) fine-tuning, finding that neologisms outperform fine-tuned models under a matched training setup (same data and hyperparameters). We also investigate self-verbalizations of neologisms, and observe that the model will occasionally make up its own new words when asked about a neologism.
翻译:在语言建模中,新词是指经过训练以表示给定模型词汇表中尚未包含概念的新标记。新词可用于引导模型产生特定行为,例如通过在提示后附加"请给出新词答案"。行为引导也可通过微调实现,但需要更多计算资源且灵活性较低:学习一个新词仅需训练d个参数,并允许用户仍能访问模型的默认行为。我们比较了新词学习与低秩自适应(LoRA)微调的性能,发现在匹配的训练设置(相同数据和超参数)下,新词方法优于微调模型。我们还研究了新词的自述现象,观察到当被问及新词时,模型偶尔会自行创造新词汇。