改进基于适应的跨语文转让的普及化 (Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing) - 专知论文

会员服务 ·

0

泛化理论 · AIM · Learning · Performance · state-of-the-art ·

2023 年 1 月 13 日

Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing

翻译：改进基于适应的跨语文转让的普及化

Chen Cecilia Liu,Jonas Pfeiffer,Ivan Vulić,Iryna Gurevych

Standard fine-tuning of language models typically performs well on in-distribution data, but suffers with generalization to distribution shifts. In this work, we aim to improve generalization of adapter-based cross-lingual task transfer where such cross-language distribution shifts are imminent. We investigate scheduled unfreezing algorithms -- originally proposed to mitigate catastrophic forgetting in transfer learning -- for fine-tuning task adapters in cross-lingual transfer. Our experiments show that scheduled unfreezing methods close the gap to full fine-tuning and achieve state-of-the-art transfer performance, suggesting that these methods can go beyond just mitigating catastrophic forgetting. Next, aiming to delve deeper into those empirical findings, we investigate the learning dynamics of scheduled unfreezing using Fisher Information. Our in-depth experiments reveal that scheduled unfreezing induces different learning dynamics compared to standard fine-tuning, and provide evidence that the dynamics of Fisher Information during training correlate with cross-lingual generalization performance. We additionally propose a general scheduled unfreezing algorithm that achieves an average of 2 points improvement over four datasets compared to standard fine-tuning and provides strong empirical evidence for a theory-based justification of the heuristic unfreezing schedule (i.e., the heuristic schedule is implicitly maximizing Fisher Information). Our code will be publicly available.

翻译：语言模式的标准微调通常在分配数据方面表现良好,但一般化到分布变化。在这项工作中,我们的目标是在跨语言分布变化即将到来的地方,改进基于适应器的跨语言任务转移任务转移的普及性。我们调查预定的解冻算法 -- -- 最初是为了减轻转移过程中灾难性的遗忘而提出的,目的是在跨语言转移中微调任务调整适应者。我们的实验表明,预定的解冻方法缩小了差距,以全面微调和达到最先进的转移性能,表明这些方法可以超越仅仅减轻灾难性的遗忘。接下来,为了更深入了解这些经验性调查结果,我们调查利用渔业信息预定解冻任务转移的学习动态。我们深入的实验显示,预定的解冻算法 -- -- 最初是为了减轻转移过程中的灾难性遗忘 -- -- 而不是标准的微调 -- -- 提供证据证明,在培训期间,渔业信息动态与跨语言的概括化性通用性化性工作表现有关。我们还提议,一般的预定的解冻算法方法可以达到平均2点,比标准微调改进4个数据集,并提供强有力的实证证证据证据,说明如何利用渔业信息进行理论化。

0

相关内容

泛化理论

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

专知会员服务

7+阅读 · 2022年3月19日

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

长链非编码RNA uc002bbp.2在 NSCLC顺铂耐药中的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

NOD2介导的自噬在糖尿病肾病肾小管损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

神经细胞自噬水平下降在创伤后癫痫易患性增加中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Blimp-1对小鼠allo-HSCT后GVHD发病的调控作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-TUSC7在胃癌中的抑癌作用及机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

Dkk3在肝母细胞瘤细胞中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

几个非线性Schrodinger方程组模型及相关问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

滋养细胞合体化障碍参与恶性滋养细胞肿瘤耐药机制

国家自然科学基金

0+阅读 · 2009年12月31日

BsMAb预定位技术提高MR分子成像敏感性的可行性研究

国家自然科学基金

0+阅读 · 2009年12月31日

ERα介导的膜雌激素信号通路在神经细胞生长和凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Multimodal Multi-User Surface Recognition with the Kernel Two-Sample Test

Arxiv

0+阅读 · 2023年3月8日

Extending DNN-based Multiplicative Masking to Deep Subband Filtering for Improved Dereverberation

Extending DNN-based Multiplicative Masking to Deep Subband Filtering for Improved Dereverberation

Arxiv

0+阅读 · 2023年3月8日

Aggregation of Disentanglement: Reconsidering Domain Variations in Domain Generalization

Arxiv

0+阅读 · 2023年3月7日

Improving Self-Supervised Learning for Audio Representations by Feature Diversity and Decorrelation

Arxiv

0+阅读 · 2023年3月7日

Deconstructed Generation-Based Zero-Shot Model

Arxiv

0+阅读 · 2023年3月7日

Agent-based Collaborative Random Search for Hyper-parameter Tuning and Global Function Optimization

Arxiv

0+阅读 · 2023年3月3日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Arxiv

20+阅读 · 2021年5月10日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

专知会员服务

7+阅读 · 2022年3月19日

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

相关论文

Multimodal Multi-User Surface Recognition with the Kernel Two-Sample Test

Arxiv

0+阅读 · 2023年3月8日

Extending DNN-based Multiplicative Masking to Deep Subband Filtering for Improved Dereverberation

Extending DNN-based Multiplicative Masking to Deep Subband Filtering for Improved Dereverberation

Arxiv

0+阅读 · 2023年3月8日

Aggregation of Disentanglement: Reconsidering Domain Variations in Domain Generalization

Arxiv

0+阅读 · 2023年3月7日

Improving Self-Supervised Learning for Audio Representations by Feature Diversity and Decorrelation

Arxiv

0+阅读 · 2023年3月7日

Deconstructed Generation-Based Zero-Shot Model

Arxiv

0+阅读 · 2023年3月7日

Agent-based Collaborative Random Search for Hyper-parameter Tuning and Global Function Optimization

Arxiv

0+阅读 · 2023年3月3日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Arxiv

20+阅读 · 2021年5月10日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

相关基金

长链非编码RNA uc002bbp.2在 NSCLC顺铂耐药中的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

NOD2介导的自噬在糖尿病肾病肾小管损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

神经细胞自噬水平下降在创伤后癫痫易患性增加中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Blimp-1对小鼠allo-HSCT后GVHD发病的调控作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-TUSC7在胃癌中的抑癌作用及机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

Dkk3在肝母细胞瘤细胞中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

几个非线性Schrodinger方程组模型及相关问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

滋养细胞合体化障碍参与恶性滋养细胞肿瘤耐药机制

国家自然科学基金

0+阅读 · 2009年12月31日

BsMAb预定位技术提高MR分子成像敏感性的可行性研究

国家自然科学基金

0+阅读 · 2009年12月31日

ERα介导的膜雌激素信号通路在神经细胞生长和凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员