转让类似BERT的变异器知识,用于证明人核查 (Transferring BERT-like Transformers' Knowledge for Authorship Verification) - 专知论文

会员服务 ·

0

Performer · 数据集 · CLUES · MoDELS · 可辨认的 ·

2021 年 12 月 9 日

Transferring BERT-like Transformers' Knowledge for Authorship Verification

翻译：转让类似BERT的变异器知识,用于证明人核查

Andrei Manolache,Florin Brad,Elena Burceanu,Antonio Barbalau,Radu Ionescu,Marius Popescu

from arxiv, 16 pages, 3 figures

The task of identifying the author of a text spans several decades and was tackled using linguistics, statistics, and, more recently, machine learning. Inspired by the impressive performance gains across a broad range of natural language processing tasks and by the recent availability of the PAN large-scale authorship dataset, we first study the effectiveness of several BERT-like transformers for the task of authorship verification. Such models prove to achieve very high scores consistently. Next, we empirically show that they focus on topical clues rather than on author writing style characteristics, taking advantage of existing biases in the dataset. To address this problem, we provide new splits for PAN-2020, where training and test data are sampled from disjoint topics or authors. Finally, we introduce DarkReddit, a dataset with a different input data distribution. We further use it to analyze the domain generalization performance of models in a low-data regime and how performance varies when using the proposed PAN-2020 splits for fine-tuning. We show that those splits can enhance the models' capability to transfer knowledge over a new, significantly different dataset.

翻译：确定文本作者的任务跨越几十年,是用语言、统计以及最近的机器学习来完成的。由于在广泛的自然语言处理任务中取得了令人印象深刻的业绩成果,以及最近提供了PAN大规模作者数据集,我们首先研究几个类似于BERT的变压器对作者核查任务的有效性。这些模型证明能够始终取得很高的分数。接下来,我们从经验上表明,它们侧重于主题线索,而不是作者写作风格特征,利用数据集中现有的偏差来解决这一问题。为了解决这个问题,我们为PAN-2020提供了新的分解,因为那里对培训和测试数据进行了抽样,从脱节主题或作者中提取。最后,我们引入了DarkReddit,这是一个具有不同输入数据分布的数据集。我们进一步利用它来分析低数据系统中模型的域化性表现,以及在使用拟议的PAN-2020分法进行微调时,其性能如何不同。我们显示,这些分法可以提高模型在新的、显著不同的数据集中转让知识的能力。

0

相关内容

Performer

【Manning新书】迁移学习自然语言处理，266页pdf，Transfer Learning for NLP

【Manning新书】迁移学习自然语言处理，266页pdf，Transfer Learning for NLP

专知会员服务

137+阅读 · 2021年11月6日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【斯坦福】探究预训练语言模型中的可迁移性，Investigating Transferability in PLM

【斯坦福】探究预训练语言模型中的可迁移性，Investigating Transferability in PLM

专知会员服务

20+阅读 · 2020年5月3日

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

专知会员服务

50+阅读 · 2020年2月28日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】手把手深度学习模型部署指南

【推荐】手把手深度学习模型部署指南

机器学习研究会

5+阅读 · 2018年1月23日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

Transformers and the representation of biomedical background knowledge

Arxiv

0+阅读 · 2022年2月4日

已删除

Arxiv

33+阅读 · 2020年3月23日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Enriching BERT with Knowledge Graph Embeddings for Document Classification

Arxiv

6+阅读 · 2019年9月18日

KG-BERT: BERT for Knowledge Graph Completion

Arxiv

15+阅读 · 2019年9月11日

Semantics-aware BERT for Language Understanding

Arxiv

4+阅读 · 2019年9月5日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

3+阅读 · 2019年5月18日

ERNIE: Enhanced Language Representation with Informative Entities

Arxiv

5+阅读 · 2019年5月17日

Deep Short Text Classification with Knowledge Powered Attention

Arxiv

8+阅读 · 2019年2月21日

VIP会员

文章信息

相关主题

相关VIP内容

【Manning新书】迁移学习自然语言处理，266页pdf，Transfer Learning for NLP

【Manning新书】迁移学习自然语言处理，266页pdf，Transfer Learning for NLP

专知会员服务

137+阅读 · 2021年11月6日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【斯坦福】探究预训练语言模型中的可迁移性，Investigating Transferability in PLM

【斯坦福】探究预训练语言模型中的可迁移性，Investigating Transferability in PLM

专知会员服务

20+阅读 · 2020年5月3日

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

专知会员服务

50+阅读 · 2020年2月28日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】手把手深度学习模型部署指南

【推荐】手把手深度学习模型部署指南

机器学习研究会

5+阅读 · 2018年1月23日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

相关论文

Transformers and the representation of biomedical background knowledge

Arxiv

0+阅读 · 2022年2月4日

已删除

Arxiv

33+阅读 · 2020年3月23日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Enriching BERT with Knowledge Graph Embeddings for Document Classification

Arxiv

6+阅读 · 2019年9月18日

KG-BERT: BERT for Knowledge Graph Completion

Arxiv

15+阅读 · 2019年9月11日

Semantics-aware BERT for Language Understanding

Arxiv

4+阅读 · 2019年9月5日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

3+阅读 · 2019年5月18日

ERNIE: Enhanced Language Representation with Informative Entities

Arxiv

5+阅读 · 2019年5月17日

Deep Short Text Classification with Knowledge Powered Attention

Arxiv

8+阅读 · 2019年2月21日

微信扫码咨询专知VIP会员