新词学习：一种参数高效的模型引导替代方案 (Neologism Learning as a Parameter-Efficient Alternative to Fine-Tuning for Model Steering) - 专知论文

会员服务 ·

0

微调 · 参数高效 · 表示 · 包含 · 导模 ·

Neologism Learning as a Parameter-Efficient Alternative to Fine-Tuning for Model Steering

翻译：新词学习：一种参数高效的模型引导替代方案

Sungjoon Park,Varun Ramamurthi,Owen Terry

In language modeling, neologisms are new tokens trained to represent a concept not already included in a given model's vocabulary. Neologisms can be used to encourage specific behavior in models, for example by appending prompts with "Give me a neologism answer." Behavioral steering can also be achieved through fine-tuning, albeit with more compute and less flexibility: learning a neologism only trains d parameters and allows the user to still access the model's default behavior. We compare the performance of neologism learning against low-rank adaptation (LoRA) fine-tuning, finding that neologisms outperform fine-tuned models under a matched training setup (same data and hyperparameters). We also investigate self-verbalizations of neologisms, and observe that the model will occasionally make up its own new words when asked about a neologism.

翻译：在语言建模中，新词是指经过训练以表示给定模型词汇表中尚未包含概念的新标记。新词可用于引导模型产生特定行为，例如通过在提示后附加"请给出新词答案"。行为引导也可通过微调实现，但需要更多计算资源且灵活性较低：学习一个新词仅需训练d个参数，并允许用户仍能访问模型的默认行为。我们比较了新词学习与低秩自适应（LoRA）微调的性能，发现在匹配的训练设置（相同数据和超参数）下，新词方法优于微调模型。我们还研究了新词的自述现象，观察到当被问及新词时，模型偶尔会自行创造新词汇。

0

相关内容

【CVPR2022】MSDN: 零样本学习的互语义蒸馏网络

【CVPR2022】MSDN: 零样本学习的互语义蒸馏网络

专知会员服务

21+阅读 · 2022年3月8日

EMNLP 2021 | 预训练跨语言模型中的大词表构建及使用

EMNLP 2021 | 预训练跨语言模型中的大词表构建及使用

专知会员服务

22+阅读 · 2022年1月5日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知会员服务

41+阅读 · 2020年8月31日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【AAAI2021】知识图谱增强的预训练模型的生成式常识推理

【AAAI2021】知识图谱增强的预训练模型的生成式常识推理

专知

29+阅读 · 2021年1月25日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知

18+阅读 · 2020年8月31日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

专知

72+阅读 · 2020年2月29日

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

PaperWeekly

20+阅读 · 2019年4月24日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于多样化查询的多标记主动学习研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

高维复杂结构数据降维

国家自然科学基金

10+阅读 · 2014年12月31日

Investigating Model Editing for Unlearning in Large Language Models

Arxiv

0+阅读 · 12月23日

FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models

Arxiv

0+阅读 · 12月23日

G2rammar: Bilingual Grammar Modeling for Enhanced Text-attributed Graph Learning

Arxiv

0+阅读 · 12月22日

ARC: Leveraging Compositional Representations for Cross-Problem Learning on VRPs

Arxiv

0+阅读 · 12月21日

Distributed Asymmetric Allocation: A Topic Model for Large Imbalanced Corpora in Social Sciences

Arxiv

0+阅读 · 12月19日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2022】MSDN: 零样本学习的互语义蒸馏网络

【CVPR2022】MSDN: 零样本学习的互语义蒸馏网络

专知会员服务

21+阅读 · 2022年3月8日

EMNLP 2021 | 预训练跨语言模型中的大词表构建及使用

EMNLP 2021 | 预训练跨语言模型中的大词表构建及使用

专知会员服务

22+阅读 · 2022年1月5日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知会员服务

41+阅读 · 2020年8月31日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【斯坦福博士论文】数据、决策与过度依赖：构建可信人工智能的核心挑战

《多域时代中维持弹性军事训练：挑战与机遇》

【AAAI2026】专家数量何为最优？面向混合专家模型的语义专业化优化研究

自进化人工智能体的全面综述：连接基础模型与终身自主智能系统的新范式

相关资讯

【AAAI2021】知识图谱增强的预训练模型的生成式常识推理

【AAAI2021】知识图谱增强的预训练模型的生成式常识推理

专知

29+阅读 · 2021年1月25日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知

18+阅读 · 2020年8月31日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning，33页ppt

专知

72+阅读 · 2020年2月29日

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

PaperWeekly

20+阅读 · 2019年4月24日

相关论文

Investigating Model Editing for Unlearning in Large Language Models

Arxiv

0+阅读 · 12月23日

FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models

Arxiv

0+阅读 · 12月23日

G2rammar: Bilingual Grammar Modeling for Enhanced Text-attributed Graph Learning

Arxiv

0+阅读 · 12月22日

ARC: Leveraging Compositional Representations for Cross-Problem Learning on VRPs

Arxiv

0+阅读 · 12月21日

Distributed Asymmetric Allocation: A Topic Model for Large Imbalanced Corpora in Social Sciences

Arxiv

0+阅读 · 12月19日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于多样化查询的多标记主动学习研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

高维复杂结构数据降维

国家自然科学基金

10+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员