统一文本结构化：指导调校的语言模型 (Unified Text Structuralization with Instruction-tuned Language Models) - 专知论文

会员服务 ·

0

文本结构 · 信息提取 · 结构化 · 结构 · 提取 ·

2023 年 3 月 30 日

Unified Text Structuralization with Instruction-tuned Language Models

翻译：统一文本结构化：指导调校的语言模型

Xuanfan Ni,Piji Li,Huayang Li

from arxiv, 13 pages, 5 figures

Text structuralization is one of the important fields of natural language processing (NLP) consists of information extraction (IE) and structure formalization. However, current studies of text structuralization suffer from a shortage of manually annotated high-quality datasets from different domains and languages, which require specialized professional knowledge. In addition, most IE methods are designed for a specific type of structured data, e.g., entities, relations, and events, making them hard to generalize to others. In this work, we propose a simple and efficient approach to instruct large language model (LLM) to extract a variety of structures from texts. More concretely, we add a prefix and a suffix instruction to indicate the desired IE task and structure type, respectively, before feeding the text into a LLM. Experiments on two LLMs show that this approach can enable language models to perform comparable with other state-of-the-art methods on datasets of a variety of languages and knowledge, and can generalize to other IE sub-tasks via changing the content of instruction. Another benefit of our approach is that it can help researchers to build datasets in low-source and domain-specific scenarios, e.g., fields in finance and law, with low cost.

翻译：文本结构化是自然语言处理的重要领域之一，包括信息提取和结构形式化。然而，当前文本结构化研究在不同领域和语言的高质量手工注释数据集方面短缺，需要特定的专业知识。此外，大多数信息提取方法被设计用于特定类型的结构化数据（如实体、关系和事件），使它们难以推广到其他类型。在这项工作中，我们提出了一种简单而有效的方法，通过指导大型语言模型提取文本中的各种结构。更具体地，在将文本输入语言模型之前，我们添加了前缀和后缀指示所需的信息提取任务和结构类型。通过两个语言模型的实验，我们发现这种方法可以使语言模型在不同语言和领域的数据集上表现得与其他先进方法相当，在不同的信息提取子任务上进行推广可以通过改变指示内容来实现。我们方法的另一个好处是，它可以帮助研究人员在低成本情况下构建金融和法律等特定领域的数据集。

0

相关内容

文本结构

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

专知会员服务

22+阅读 · 2022年3月7日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

组蛋白修饰酶SETD2功能缺失促进MLL白血病发生的表观遗传调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

高血压患者Corin基因变异对其蛋白结构及酶功能影响的研究

国家自然科学基金

0+阅读 · 2015年12月31日

命题与模态逻辑的扩展规则推理与混合推理方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

铁皮石斛内生菌多样性及其促进铁皮石斛活性成分积累的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

通用Web结构化信息检索引擎的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

复合材料层合板多尺度破坏失效力学性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

基于本体的信息网格访问控制研究

国家自然科学基金

1+阅读 · 2008年12月31日

Multimodal Web Navigation with Instruction-Finetuned Foundation Models

Arxiv

0+阅读 · 2023年5月19日

LLM-Pruner: On the Structural Pruning of Large Language Models

Arxiv

0+阅读 · 2023年5月19日

Are Large Language Models Fit For Guided Reading?

Arxiv

0+阅读 · 2023年5月19日

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings

Arxiv

0+阅读 · 2023年5月19日

Numeric Magnitude Comparison Effects in Large Language Models

Arxiv

0+阅读 · 2023年5月18日

DarkBERT: A Language Model for the Dark Side of the Internet

Arxiv

0+阅读 · 2023年5月18日

CoEdIT: Text Editing by Task-Specific Instruction Tuning

Arxiv

0+阅读 · 2023年5月17日

SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation

Arxiv

0+阅读 · 2023年5月10日

Nature Language Reasoning, A Survey

Arxiv

78+阅读 · 2023年3月26日

An Attentive Survey of Attention Models

Arxiv

19+阅读 · 2019年4月5日

VIP会员

文章信息

相关主题

相关VIP内容

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

专知会员服务

22+阅读 · 2022年3月7日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

Multimodal Web Navigation with Instruction-Finetuned Foundation Models

Arxiv

0+阅读 · 2023年5月19日

LLM-Pruner: On the Structural Pruning of Large Language Models

Arxiv

0+阅读 · 2023年5月19日

Are Large Language Models Fit For Guided Reading?

Arxiv

0+阅读 · 2023年5月19日

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings

Arxiv

0+阅读 · 2023年5月19日

Numeric Magnitude Comparison Effects in Large Language Models

Arxiv

0+阅读 · 2023年5月18日

DarkBERT: A Language Model for the Dark Side of the Internet

Arxiv

0+阅读 · 2023年5月18日

CoEdIT: Text Editing by Task-Specific Instruction Tuning

Arxiv

0+阅读 · 2023年5月17日

SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation

Arxiv

0+阅读 · 2023年5月10日

Nature Language Reasoning, A Survey

Arxiv

78+阅读 · 2023年3月26日

An Attentive Survey of Attention Models

Arxiv

19+阅读 · 2019年4月5日

相关基金

组蛋白修饰酶SETD2功能缺失促进MLL白血病发生的表观遗传调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

高血压患者Corin基因变异对其蛋白结构及酶功能影响的研究

国家自然科学基金

0+阅读 · 2015年12月31日

命题与模态逻辑的扩展规则推理与混合推理方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

混凝土Weibull统计尺寸效应理论模型改进研究

国家自然科学基金

0+阅读 · 2013年12月31日

铁皮石斛内生菌多样性及其促进铁皮石斛活性成分积累的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

通用Web结构化信息检索引擎的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

复合材料层合板多尺度破坏失效力学性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

基于本体的信息网格访问控制研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员