使用强化学习将英语中等规模GPT模型对齐到西班牙语的小封闭领域 (Aligning a medium-size GPT model in English to a small closed domain in Spanish using reinforcement learning) - 专知论文

会员服务 ·

0

解码 · 开放领域 · 困惑度 · 强化学习 · BLEU ·

2023 年 4 月 3 日

Aligning a medium-size GPT model in English to a small closed domain in Spanish using reinforcement learning

翻译：使用强化学习将英语中等规模GPT模型对齐到西班牙语的小封闭领域

Oscar R. Navarrete-Parra,Victor Uc-Cetina,Jorge Reyes-Magana

from arxiv, Under review in the journal Procesamiento del Lenguaje Natural

In this paper, we propose a methodology to align a medium-sized GPT model, originally trained in English for an open domain, to a small closed domain in Spanish. The application for which the model is finely tuned is the question answering task. To achieve this we also needed to train and implement another neural network (which we called the reward model) that could score and determine whether an answer is appropriate for a given question. This component served to improve the decoding and generation of the answers of the system. Numerical metrics such as BLEU and perplexity were used to evaluate the model, and human judgment was also used to compare the decoding technique with others. Finally, the results favored the proposed method, and it was determined that it is feasible to use a reward model to align the generation of responses.

翻译：本文提出了一种方法，将原本用于开放领域的中等规模GPT模型在西班牙语的小封闭领域中对齐。该模型的应用是问题回答任务。为了实现这一目标，我们还需要训练并实现另一个神经网络（我们称之为奖励模型），该模型能够评分并确定一个回答是否适合给定的问题。该组件有助于改善系统的解码和答案生成。采用BLEU和困惑度等数值指标来评估模型，并使用人类判断比较了解码技术和其他技术。最终，结果支持所提出的方法，并且确定使用奖励模型对齐响应的生成是可行的。

0

相关内容

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

专知会员服务

19+阅读 · 2022年3月13日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

谷歌提出“T5” 新NLP模型，突破迁移学习局限，多基准测试达SOTA！

谷歌提出“T5” 新NLP模型，突破迁移学习局限，多基准测试达SOTA！

专知会员服务

41+阅读 · 2020年2月26日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

专知会员服务

50+阅读 · 2020年1月3日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

NLP - 基于 BERT 的中文命名实体识别（NER)

NLP - 基于 BERT 的中文命名实体识别（NER)

AINLP

466+阅读 · 2019年2月10日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

复多项式的核拓扑熵

国家自然科学基金

0+阅读 · 2015年12月31日

动力系统的可积、分支与嵌入流

国家自然科学基金

0+阅读 · 2012年12月31日

红花黄色素调节Tau蛋白磷酸化及APP加工过程的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向Deep Web的大规模知识库自动构建方法研究

国家自然科学基金

4+阅读 · 2011年12月31日

图在曲面上嵌入的分类

国家自然科学基金

0+阅读 · 2011年12月31日

大型天文望远镜状态监控与故障诊断技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

地面激光雷达提取森林单木结构参数研究

国家自然科学基金

0+阅读 · 2009年12月31日

藏文字符排序研究

国家自然科学基金

0+阅读 · 2009年12月31日

分形集上Diophantine逼近的若干问题研究

国家自然科学基金

0+阅读 · 2009年12月31日

大型拼接镜面望远镜面形检测预研究

国家自然科学基金

0+阅读 · 2008年12月31日

Solving Stabilize-Avoid Optimal Control via Epigraph Form and Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月23日

RLBoost: Boosting Supervised Models using Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月23日

Wikipedia and open access

Arxiv

0+阅读 · 2023年5月23日

Deep Clustering for Data Cleaning and Integration

Arxiv

0+阅读 · 2023年5月22日

Adaptive action supervision in reinforcement learning from real-world multi-agent demonstrations

Arxiv

0+阅读 · 2023年5月22日

Analysis of Utterance Embeddings and Clustering Methods Related to Intent Induction for Task-Oriented Dialogue

Arxiv

0+阅读 · 2023年5月19日

Explicit Planning Helps Language Models in Logical Reasoning

Arxiv

0+阅读 · 2023年5月19日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Domain Generalization in Vision: A Survey

Arxiv

16+阅读 · 2021年7月18日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

VIP会员

文章信息

相关主题

相关VIP内容

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

专知会员服务

19+阅读 · 2022年3月13日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

谷歌提出“T5” 新NLP模型，突破迁移学习局限，多基准测试达SOTA！

谷歌提出“T5” 新NLP模型，突破迁移学习局限，多基准测试达SOTA！

专知会员服务

41+阅读 · 2020年2月26日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

专知会员服务

50+阅读 · 2020年1月3日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

NeurIPS 2025 | 自动化所新作速览（一）

大型语言模型（LLM）赋能的知识图谱构建：综述

NeurIPS 2025 | 自动化所新作速览（二）

领域特定文本分类中的预训练语言模型新进展：系统综述

相关资讯

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

NLP - 基于 BERT 的中文命名实体识别（NER)

NLP - 基于 BERT 的中文命名实体识别（NER)

AINLP

466+阅读 · 2019年2月10日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Solving Stabilize-Avoid Optimal Control via Epigraph Form and Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月23日

RLBoost: Boosting Supervised Models using Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月23日

Wikipedia and open access

Arxiv

0+阅读 · 2023年5月23日

Deep Clustering for Data Cleaning and Integration

Arxiv

0+阅读 · 2023年5月22日

Adaptive action supervision in reinforcement learning from real-world multi-agent demonstrations

Arxiv

0+阅读 · 2023年5月22日

Analysis of Utterance Embeddings and Clustering Methods Related to Intent Induction for Task-Oriented Dialogue

Arxiv

0+阅读 · 2023年5月19日

Explicit Planning Helps Language Models in Logical Reasoning

Arxiv

0+阅读 · 2023年5月19日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Domain Generalization in Vision: A Survey

Arxiv

16+阅读 · 2021年7月18日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

相关基金

复多项式的核拓扑熵

国家自然科学基金

0+阅读 · 2015年12月31日

动力系统的可积、分支与嵌入流

国家自然科学基金

0+阅读 · 2012年12月31日

红花黄色素调节Tau蛋白磷酸化及APP加工过程的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向Deep Web的大规模知识库自动构建方法研究

国家自然科学基金

4+阅读 · 2011年12月31日

图在曲面上嵌入的分类

国家自然科学基金

0+阅读 · 2011年12月31日

大型天文望远镜状态监控与故障诊断技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

地面激光雷达提取森林单木结构参数研究

国家自然科学基金

0+阅读 · 2009年12月31日

藏文字符排序研究

国家自然科学基金

0+阅读 · 2009年12月31日

分形集上Diophantine逼近的若干问题研究

国家自然科学基金

0+阅读 · 2009年12月31日

大型拼接镜面望远镜面形检测预研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员