BART原理简介与代码实战

2020 年 4 月 1 日 AINLP

写在前面

最近huggingface的transformer库,增加了BART模型,Bart是该库中最早的Seq2Seq模型之一,在文本生成任务,例如摘要抽取方面达到了SOTA的结果。本次放出了三组不同的预训练权重:

  • bart-large:基础预训练模型;
  • bart-large-cnn:基础模型在 CNN/Daily Mail Abstractive Summarization Task微调后的模型;
  • bart-large-mnli:基础模型在MNLI classification task微调后的模型;

下面我们来看看BART。

背景:Seq2Seq预训练

去年10月,来自Google和Facebook的团队分别发布了新的Transformer-related论文:T5BART。这两篇论文在如抽象总结和对话等生成任务上都取得了更好的下游性能,主要有两个改变:

  • 在BERT的双向编码器架构中添加因果解码器;
  • 用更复杂的预训练任务代替BERT的完形填空任务。

现在让我们更深入地研究Seq2Seq预训练思想!

BERT vs. GPT2

正如BART作者在论文中写的,

(BART) can be seen as generalizing Bert (due to the bidirectional encoder) and GPT2 (with the left to right decoder).

BERT

BERT最重要的预训练任务是预测masked token,并使用整个输入来获取更完全的信息以进行更准确的预测。这对于那些允许利用位置 之后的信息来预测位置 的任务是有效的,但是对于诸如文本生成之类的任务则没有多大用处,这些对位置i的预测只能取决于先前生成的单词。

在BERT源码中,在预测位置 时可以使用哪些信息是由由一个称为attention_mask的参数来控制的, 注意掩码中的值为1表示模型在预测行的单词时可以利用的列单词的信息。

下图是BERT的"Fully-visible" 注意力矩阵,关于BERT更为详细的讲解可以参考往期文章:

GPT

GPT预训练任务使用的是autoregressive的思想,使用已经解码出的信息来预测下一个位置。该种模式对于生成任务更为有效,而对于那些可以使用全局输入来得到输出的下游任务则比较差劲。

同样的,给出GPT的注意力矩阵,在这里,当我们预测eating时,可以使用的信息只有<BOS> I love

Encoder-Decoder

我们的新朋友,例如BART,可以做到两全其美。

其中Encoder的注意力矩阵是Fully-visible的,而Decoder的注意力矩阵是autoregressive,编码器和解码器通过cross attention连接,其中每个解码器层都对编码器输出的最终隐藏状态进行attention操作,这会使得模型生成与原始输入紧密相关的输出。

预训练模式

Bart和T5在预训练时都将文本span用掩码替换, 然后让模型学着去重建原始文档。(PS.这里进行了简化, 这两篇论文都对许多不同的预训练任务进行了实验,发现这一方法表现良好。T5使用replace corrupted spans任务, 没有进行mask操作,而是选择了随机token进行替换。)

BART论文的图很好地说明了这一点:在上述示例中,原始文档为A B C D E。在编码之前将文本[C,D]屏蔽掉,又在B之前插入一个额外的掩码,然后将损坏的文档A _ B _ E作为编码器的输入。解码器必须使用编码器的输出和先前未损坏的标记来重建原始文档。

Summarization

摘要生成任务中,输入序列是我们要总结的文档,输出序列是一段事实摘要。Seq2Seq架构可直接用于摘要任务,而无需任何新的操作, 并且预训练任务也非常适合下游任务。 下表中的数字证实了这一点:在CNN / Daily Mail抽象摘要任务中,所有新的Seq2Seq模型都比那些old less-fancy模型做得好得多,而BART的表现尤其出色。

  • BertSumABS来自论文《Text Summarization with Pretrained Encoders》,使用Seq2Seq结构,但没有对解码器进行预训练。 TransformerAbs来自同一篇论文,使用稍小的模型,并且也没有预训练。
  • PT-Gen来自论文《 Get To The Point: Summarization with Pointer-Generator Networks》
  • UniLM是一种“ Prefix-LM”,具有与Bart和T5相似的masking策略

Demo: BartForConditionalGeneration

这一节来看看如何用几行代码就完成一个摘要抽取任务。

Step 1

首先安装最新版本的transformers和一些必要的库

!pip install transformers --upgrade
import torch
try:
import transformers
from transformers import BartTokenizer, BartForConditionalGeneration
except ImportError:
raise ImportError(INSTALL_MSG)
from IPython.display import display, Markdown

Step 2

找到一篇待抽取的长长长文本

torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'

LONG_BORING_TENNIS_ARTICLE = """
Andy Murray came close to giving himself some extra preparation time for his w
edding next week before ensuring that he still has unfinished tennis business to
attend to. The world No 4 is into the semi-finals of the Miami Open, but not be
fore getting a scare from 21 year-old Austrian Dominic Thiem, who pushed him to
4-4 in the second set before going down 3-6 6-4, 6-1 in an hour and three quarte
rs. Murray was awaiting the winner from the last eight match between Tomas Berdy
ch and Argentina's Juan Monaco. Prior to this tournament Thiem lost in the secon
d round of a Challenger event to soon-to-be new Brit Aljaz Bedene. Andy Murray p
umps his first after defeating Dominic Thiem to reach the Miami Open semi finals
. Muray throws his sweatband into the crowd after completing a 3-6, 6-4, 6-1 vi
ctory in Florida . Murray shakes hands with Thiem who he described as a 'strong
guy' after the game . And Murray has a fairly simple message for any of his fell
ow British tennis players who might be agitated about his imminent arrival into
the home ranks: don't complain. Instead the British No 1 believes his colleagues
should use the assimilation of the world number 83, originally from Slovenia, a
s motivation to better themselves. At present any grumbles are happening in priv
ate, and Bedene's present ineligibility for the Davis Cup team has made it less
of an issue, although that could change if his appeal to play is allowed by the
International Tennis Federation. Murray thinks anyone questioning the move, now
it has become official, would be better working on getting their ranking closer
to his. 'If he was 500 in the world they wouldn't be that fussed about it but ob
viously he threatens their position a bit,' said the 27 year-old Scot. ' and he'
s obviously the British number two, comfortably. 'So they can complain but the b
est thing to do is use it in the right way and accept it for what it is, and try
to use it as motivation whether they agree with it or not. He's British now so
they've just got to deal with it. Murray stretches for a return after starting h
is quarter final match slowly on the show court . Thiem held nothing back as he
raced through the opening set, winning it 6-3 with a single break . The young Au
strian is considered to be one of the hottest prospects on the ATP Tour . 'I wou
ld hope that all the guys who are below him now like James (Ward) , Kyle (Edmund
) , Liam (Broady) they will use it as motivation. If he becomes eligible for Dav
is Cup then those guys are going to have to prove themselves. 'It can only be se
en as a positive for those guys using it to try to get better. He's a good playe
r but so are James and Kyle and Liam has improved. Aljaz is there, he's on the t
our every week, the other guys aren't quite there yet.' For the first time Murra
y, who has an encyclopaedic knowledge of the top 100, gave his opinion of Bedene
: 'He's a good player with a very good serve. He's a legitimate top 100 player,
when he plays Challengers he's there or thereabouts, when he plays on the main t
our he wins matches, it's not like he turns up and always loses in the first rou
nd. Murray's fiancee was once again watching from the stands shaded by a huge br
immed hat . Kim Sears flashes her enormous diamond engagement ring while watchin
g her beau on court . 'He had a bad injury last year (wrist) but has recovered w
ell. I would imagine he would keep moving up the rankings although I don't know
exactly how high he can go. I've practised with him a couple of times, I haven't
seen him play loads, but when you serve as well as he does it helps. I would im
agine he' s going to be comfortably in the top 70 or 80 in the world for a while
.' It is understood the Lawn Tennis Association will give background support to
his case regarding the Davis Cup but have made it clear that the onus is on him
to lead the way. An official statement said: 'To have another player in the men'
s top 100 is clearly a positive thing for British tennis and so we very much wel
come Aljaz's change in citizenship.' The last comparable switch came twenty year
s ago when Greg Rusedski arrived from Canada. It was by no means universally pop
ular but, like Bedene, he pledged that he was in for the long haul and, in fairn
ess to him, he proved true to his word. Loising the first set shocked Murray int
o life as he raced to a commanding lead in the second . The No 3 seed sent over
a few glaring looks towards his team before winning the second set . Murray had
to put such matters aside as he tackled the unusually talented Thiem, a delight
to watch. Coached by Boris Becker's veteran mentor Gunter Bresnik, he slightly r
esembles Andy Roddick and hits with similar power but more elegance. His single
handed backhand is a thing of rare beauty. However, he has had a mediocre season
coming into this event and there was little to forewarn of his glorious shotmak
ing that seemed to catch Murray unawares early on. The world No 4 looked to have
worked him out in the second, but then suffered one of his periopdic mental lap
ses and let him back in from 4-1 before closing it out with a break. After break
ing him for 3-1 in the decider the Austrian whirlwind burnt itself out. 'He's a
strong guy who hits the ball hard and it became a very physical match,' said Mur
ray. Murray was presented with a celebratory cake after winning his 500th match
in the previous round .
"""
.replace('\n','')

Step 3

只需几行代码调用transformers库中的BART,进行摘要生成

BART生成的摘要效果:Andy Murray beat Dominic Thiem 3-6, 6-4, 6-1 in the Miami Open. The world No 4 is into the semi-finals of the tournament in Florida. Murray was awaiting the winner from the last eight match between Tomas Berdych and Argentina's Juan Monaco. Thiem lost in the second round of a Challenger event to Aljaz Bedene.

再来看看GPT的效果,公平的说,它并不适合摘要生成的任务。

GPT生成的效果:'To have a player like James Ward, Kyle Edmund, Liam Broady and Aljaz Bedene in the top 100 is a huge achievement for the Lawn Tennis Association. The Lawn Tennis Association is committed to the development of the sport and the development of the sport's players. The Lawn Tennis Association is committed to the development of the sport and the development of the sport's players. The Lawn Tennis Association is committed to the development of the sport and the development of the sport's players. The Lawn Tennis Association is committed to the development of the sport and the development of the sport's players. The Lawn Tennis Association is committed to the development of the sport and the development of the sport's players. The Lawn Tennis Association is committed to the development of the sport and the development of the

最后,大家可以在我共享的colab里玩一玩上述demo:    https://colab.research.google.com/drive/11hKBPfsfBXPKo-dK_gHsPklF4PcNflQZ

或者直接去transformers的官方github库里查看:https://github.com/huggingface/transformers

Enjoy!

- END -

如何优雅地训练大型模型? 2020-03-03
Transformers Assemble(PART IV) 2020-02-24
Transformers Assemble(PART III) 2020-02-20


推荐阅读

AINLP年度阅读收藏清单

自动作诗机&藏头诗生成器:五言、七言、绝句、律诗全了

逆向而行,中文轻量级预训练模型的探索之路

From Word Embeddings To Document Distances 阅读笔记

模型压缩实践系列之——bert-of-theseus,一个非常亲民的bert压缩方法

这门斯坦福大学自然语言处理经典入门课,我放到B站了

可解释性论文阅读笔记1-Tree Regularization

关于AINLP

AINLP 是一个有趣有AI的自然语言处理社区,专注于 AI、NLP、机器学习、深度学习、推荐算法等相关技术的分享,主题包括文本摘要、智能问答、聊天机器人、机器翻译、自动生成、知识图谱、预训练模型、推荐系统、计算广告、招聘信息、求职经验分享等,欢迎关注!加技术交流群请添加AINLPer(id:ainlper),备注工作/研究方向+加群目的。


登录查看更多
7

相关内容

【2020新书】实战R语言4,323页pdf
专知会员服务
100+阅读 · 2020年7月1日
《深度学习》圣经花书的数学推导、原理与Python代码实现
【教程】自然语言处理中的迁移学习原理,41 页PPT
专知会员服务
95+阅读 · 2020年2月8日
Transformer文本分类代码
专知会员服务
116+阅读 · 2020年2月3日
一网打尽!100+深度学习模型TensorFlow与Pytorch代码实现集合
【干货】用BRET进行多标签文本分类(附代码)
专知会员服务
84+阅读 · 2019年12月27日
【ICLR2020】五篇Open代码的GNN论文
专知会员服务
47+阅读 · 2019年10月2日
听说你还没读过 Bert 源码?
AINLP
7+阅读 · 2019年8月7日
深度学习文本分类方法综述(代码)
中国人工智能学会
28+阅读 · 2018年6月16日
一文详解循环神经网络的基本概念(代码版)
人工智能头条
5+阅读 · 2018年4月28日
手把手教 | 深度学习库PyTorch(附代码)
数据派THU
27+阅读 · 2018年3月15日
论文 | 基于CNN的目标检测算法
七月在线实验室
9+阅读 · 2017年12月7日
推荐|深度学习PyTorch的教程代码
全球人工智能
10+阅读 · 2017年10月8日
最大熵原理(一)
深度学习探索
12+阅读 · 2017年8月3日
Talking-Heads Attention
Arxiv
15+阅读 · 2020年3月5日
How to Fine-Tune BERT for Text Classification?
Arxiv
13+阅读 · 2019年5月14日
Conditional BERT Contextual Augmentation
Arxiv
8+阅读 · 2018年12月17日
Recurrent Fusion Network for Image Captioning
Arxiv
3+阅读 · 2018年7月31日
VIP会员
相关VIP内容
【2020新书】实战R语言4,323页pdf
专知会员服务
100+阅读 · 2020年7月1日
《深度学习》圣经花书的数学推导、原理与Python代码实现
【教程】自然语言处理中的迁移学习原理,41 页PPT
专知会员服务
95+阅读 · 2020年2月8日
Transformer文本分类代码
专知会员服务
116+阅读 · 2020年2月3日
一网打尽!100+深度学习模型TensorFlow与Pytorch代码实现集合
【干货】用BRET进行多标签文本分类(附代码)
专知会员服务
84+阅读 · 2019年12月27日
【ICLR2020】五篇Open代码的GNN论文
专知会员服务
47+阅读 · 2019年10月2日
相关资讯
听说你还没读过 Bert 源码?
AINLP
7+阅读 · 2019年8月7日
深度学习文本分类方法综述(代码)
中国人工智能学会
28+阅读 · 2018年6月16日
一文详解循环神经网络的基本概念(代码版)
人工智能头条
5+阅读 · 2018年4月28日
手把手教 | 深度学习库PyTorch(附代码)
数据派THU
27+阅读 · 2018年3月15日
论文 | 基于CNN的目标检测算法
七月在线实验室
9+阅读 · 2017年12月7日
推荐|深度学习PyTorch的教程代码
全球人工智能
10+阅读 · 2017年10月8日
最大熵原理(一)
深度学习探索
12+阅读 · 2017年8月3日
Top
微信扫码咨询专知VIP会员