RAFT: 现实世界鲜热文本分类基准 (RAFT: A Real-World Few-Shot Text Classification Benchmark) - 专知论文

会员服务 ·

0

Raft算法 · 小样本学习 · 文本分类 · 基准 · MoDELS ·

2021 年 9 月 28 日

RAFT: A Real-World Few-Shot Text Classification Benchmark

翻译：RAFT: 现实世界鲜热文本分类基准

Neel Alex,Eli Lifland,Lewis Tunstall,Abhishek Thakur,Pegah Maham,C. Jess Riedel,Emmie Hine,Carolyn Ashurst,Paul Sedille,Alexis Carlier,Michael Noetel,Andreas Stuhlmüller

from arxiv, Dataset, submission instructions, code and leaderboard available at https://raft.elicit.org

Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants? Existing benchmarks are not designed to measure progress in applied settings, and so don't directly answer this question. The RAFT benchmark (Real-world Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment. Baseline evaluations on RAFT reveal areas current techniques struggle with: reasoning over long texts and tasks with many classes. Human baselines show that some classification tasks are difficult for non-expert humans, reflecting that real-world value sometimes depends on domain expertise. Yet even non-expert human baseline F1 scores exceed GPT-3 by an average of 0.11. The RAFT datasets and leaderboard will track which model improvements translate into real-world benefits at https://raft.elicit.org .

翻译：受过培训的大型语言模型显示有希望进行少见的学习,完成基于文本的任务,只给出几个特定任务的例子。模型将很快解决迄今留给人类研究助理的分类任务吗?现有的基准不是用来衡量应用环境中的进展的,因此不能直接回答这个问题。RAFT基准(Real-world附加说明的少见任务)侧重于自然发生的任务,并使用一个反映部署的评价设置。RAFT基线评价揭示了当前技术挣扎的领域:对长文本和许多类任务进行推理。人类基线显示,一些分类任务对于非专家人来说是困难的,反映了现实世界的价值有时取决于领域的专门知识。但即使是非专家的F1基准分数也平均超过GPT-30.11。 RAFT数据集和领导板将跟踪在https://raft.eclicion.org上将改进模式转化为实际世界效益的模型。

0

相关内容

Raft算法

Stanford的Diego Ongaro和John Ousterhout提出了Raft算法，这是一个更容易理解的分布式一致性算法，在算法的论文中，不仅详细描述了算法，甚至给出了RPC接口定义和伪代码，这显然更加容易应用到工程实践中。

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

专知会员服务

40+阅读 · 2020年4月17日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

已删除

将门创投

3+阅读 · 2019年1月29日

Semi-Supervised Domain Generalization in Real World:New Benchmark and Strong Baseline

Arxiv

0+阅读 · 2021年11月19日

PTR: Prompt Tuning with Rules for Text Classification

Arxiv

7+阅读 · 2021年5月24日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Meta-GNN: On Few-shot Node Classification in Graph Meta-learning

Meta-GNN: On Few-shot Node Classification in Graph Meta-learning

Arxiv

5+阅读 · 2019年5月23日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Using Ternary Rewards to Reason over Knowledge Graphs with Deep Reinforcement Learning

Arxiv

3+阅读 · 2019年2月26日

Weakly-Supervised Hierarchical Text Classification

Arxiv

3+阅读 · 2018年12月29日

Diverse Few-Shot Text Classification with Multiple Metrics

Arxiv

6+阅读 · 2018年5月19日

VIP会员

文章信息

相关主题

小样本学习

相关VIP内容

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

专知会员服务

40+阅读 · 2020年4月17日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

已删除

将门创投

3+阅读 · 2019年1月29日

相关论文

Semi-Supervised Domain Generalization in Real World:New Benchmark and Strong Baseline

Arxiv

0+阅读 · 2021年11月19日

PTR: Prompt Tuning with Rules for Text Classification

Arxiv

7+阅读 · 2021年5月24日

Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog

Arxiv

30+阅读 · 2020年2月27日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Meta-GNN: On Few-shot Node Classification in Graph Meta-learning

Meta-GNN: On Few-shot Node Classification in Graph Meta-learning

Arxiv

5+阅读 · 2019年5月23日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Using Ternary Rewards to Reason over Knowledge Graphs with Deep Reinforcement Learning

Arxiv

3+阅读 · 2019年2月26日

Weakly-Supervised Hierarchical Text Classification

Arxiv

3+阅读 · 2018年12月29日

Diverse Few-Shot Text Classification with Multiple Metrics

Arxiv

6+阅读 · 2018年5月19日

微信扫码咨询专知VIP会员