化学物质 - 蛋白质相互作用提取的端到端模型：更好的标记化和基于跨度的流程策略 (End-to-End Models for Chemical-Protein Interaction Extraction: Better Tokenization and Span-Based Pipeline Strategies) - 专知论文

会员服务 ·

0

端到端 · 提取 · 蛋白质相互作用 · 命名实体 · 实体 ·

2023 年 4 月 3 日

End-to-End Models for Chemical-Protein Interaction Extraction: Better Tokenization and Span-Based Pipeline Strategies

翻译：化学物质 - 蛋白质相互作用提取的端到端模型：更好的标记化和基于跨度的流程策略

Xuguang Ai,Ramakanth Kavuluru

from arxiv, Accepted to appear in IEEE ICHI 2023 (HealthNLP workshop). Tokenized dataset and code: https://github.com/bionlproc/end-to-end-ChemProt

End-to-end relation extraction (E2ERE) is an important task in information extraction, more so for biomedicine as scientific literature continues to grow exponentially. E2ERE typically involves identifying entities (or named entity recognition (NER)) and associated relations, while most RE tasks simply assume that the entities are provided upfront and end up performing relation classification. E2ERE is inherently more difficult than RE alone given the potential snowball effect of errors from NER leading to more errors in RE. A complex dataset in biomedical E2ERE is the ChemProt dataset (BioCreative VI, 2017) that identifies relations between chemical compounds and genes/proteins in scientific literature. ChemProt is included in all recent biomedical natural language processing benchmarks including BLUE, BLURB, and BigBio. However, its treatment in these benchmarks and in other separate efforts is typically not end-to-end, with few exceptions. In this effort, we employ a span-based pipeline approach to produce a new state-of-the-art E2ERE performance on the ChemProt dataset, resulting in $> 4\%$ improvement in F1-score over the prior best effort. Our results indicate that a straightforward fine-grained tokenization scheme helps span-based approaches excel in E2ERE, especially with regards to handling complex named entities. Our error analysis also identifies a few key failure modes in E2ERE for ChemProt.

翻译：端到端关系提取（E2ERE）是信息提取中的一项重要任务，特别是对于生物医学，因为科学文献持续呈指数级增长。 E2ERE通常涉及识别实体（或命名实体识别（NER））和相关关系，而大多数RE任务仅假设实体在提前提供，并最终执行关系分类。由于NER可能导致更多RE错误的滚雪球效应，因此E2ERE本质上比仅使用RE更困难。在生物医学E2ERE中的一个复杂数据集是ChemProt数据集（BioCreative VI，2017），该数据集识别科学文献中化合物和基因/蛋白质之间的关系。 ChemProt包含在所有最近的生物医学自然语言处理基准测试中，包括BLUE，BLURB和BigBio。但是，在这些基准测试以及其他单独的努力中，通常不会对其进行端到端处理，除了少数例外。在这个项目中，我们采用基于跨度的流程方法，以在ChemProt数据集上产生新的最先进的E2ERE性能，导致F1分数比先前的最佳努力提高了$> 4 \% $。我们的结果表明，直接的细粒度标记化方案有助于基于跨度的方法在E2ERE方面表现出色，特别是关于处理复杂命名实体方面。我们的错误分析还确定了ChemProt中E2ERE的几种关键故障模式。

0

相关内容

端到端

直接从多序列比对中学习残基协同进化用于蛋白质结构预测

直接从多序列比对中学习残基协同进化用于蛋白质结构预测

专知会员服务

6+阅读 · 2023年1月9日

Nat. Biotechnol. | 用机器学习预测多肽质谱库

Nat. Biotechnol. | 用机器学习预测多肽质谱库

专知会员服务

18+阅读 · 2022年9月12日

基于几何结构预训练的蛋白质表征学习

基于几何结构预训练的蛋白质表征学习

专知会员服务

15+阅读 · 2022年8月21日

AlphaFold预测出2亿种蛋白质结构，打开整个蛋白质宇宙

AlphaFold预测出2亿种蛋白质结构，打开整个蛋白质宇宙

专知会员服务

14+阅读 · 2022年8月1日

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

专知会员服务

43+阅读 · 2020年11月22日

【AAAI 2020】InteractE: 通过增加特征交互来改进基于卷积的知识图谱嵌入， InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

【AAAI 2020】InteractE: 通过增加特征交互来改进基于卷积的知识图谱嵌入， InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

专知会员服务

53+阅读 · 2020年6月7日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【图机器学习论文】基于深度学习的网络生物学（Deep Learning for Network Biology）

【图机器学习论文】基于深度学习的网络生物学（Deep Learning for Network Biology）

专知会员服务

11+阅读 · 2019年12月16日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

YOLOv5数据集结构解析&如何制作一个获得更好训练效果的数据集｜YOLOv5全面解析教程（二）

YOLOv5数据集结构解析&如何制作一个获得更好训练效果的数据集｜YOLOv5全面解析教程（二）

极市平台

3+阅读 · 2022年11月14日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

基于深度神经网络的关键词提取，Keywords extraction with DNN

基于深度神经网络的关键词提取，Keywords extraction with DNN

专知

10+阅读 · 2020年5月7日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

专知

18+阅读 · 2018年2月22日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

孤独症儿童早期干预的同步TMS-EEG研究

国家自然科学基金

0+阅读 · 2017年12月31日

启动子与操作子作为基本生物元件的模块化设计

国家自然科学基金

0+阅读 · 2014年12月31日

基于计算量子化学和分子动力学模拟方法以SHP-2为靶分子研究雄黄抗肿瘤的药效物质和作用机理

国家自然科学基金

0+阅读 · 2013年12月31日

基于Linked Open Data的Web服务语义互操作关键技术

国家自然科学基金

0+阅读 · 2012年12月31日

基于机器学习的蛋白质相互作用与功能预测方法研究

国家自然科学基金

2+阅读 · 2011年12月31日

贵金属纳米粒子与蛋白质相互作用的热力学和谱学研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于代谢组学的中药质量控制方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

磷酸化修饰介导的蛋白质相互作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于多样化特征表达的生物文献自动分类研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于结构域组成变换的蛋白质相互作用预测方法研究

国家自然科学基金

1+阅读 · 2008年12月31日

Deepfake Text Detection in the Wild

Arxiv

0+阅读 · 2023年5月22日

Enhancing Next Active Object-based Egocentric Action Anticipation with Guided Attention

Arxiv

0+阅读 · 2023年5月22日

Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

Arxiv

0+阅读 · 2023年5月21日

UniEX: An Effective and Efficient Framework for Unified Information Extraction via a Span-extractive Perspective

Arxiv

0+阅读 · 2023年5月19日

Expanding the Role of Affective Phenomena in Multimodal Interaction Research

Arxiv

0+阅读 · 2023年5月18日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

Time-Series Event Prediction with Evolutionary State Graph

Arxiv

14+阅读 · 2020年11月25日

A 3D Coarse-to-Fine Framework for Volumetric Medical Image Segmentation

A 3D Coarse-to-Fine Framework for Volumetric Medical Image Segmentation

Arxiv

15+阅读 · 2018年8月2日

Contextual and Position-Aware Factorization Machines for Sentiment Classification

Arxiv

13+阅读 · 2018年1月18日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

VIP会员

文章信息

相关主题

蛋白质相互作用

相关VIP内容

直接从多序列比对中学习残基协同进化用于蛋白质结构预测

直接从多序列比对中学习残基协同进化用于蛋白质结构预测

专知会员服务

6+阅读 · 2023年1月9日

Nat. Biotechnol. | 用机器学习预测多肽质谱库

Nat. Biotechnol. | 用机器学习预测多肽质谱库

专知会员服务

18+阅读 · 2022年9月12日

基于几何结构预训练的蛋白质表征学习

基于几何结构预训练的蛋白质表征学习

专知会员服务

15+阅读 · 2022年8月21日

AlphaFold预测出2亿种蛋白质结构，打开整个蛋白质宇宙

AlphaFold预测出2亿种蛋白质结构，打开整个蛋白质宇宙

专知会员服务

14+阅读 · 2022年8月1日

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

【知识图谱@EMNLP2020】Knowledge Graphs in NLP @ EMNLP 2020

专知会员服务

43+阅读 · 2020年11月22日

【AAAI 2020】InteractE: 通过增加特征交互来改进基于卷积的知识图谱嵌入， InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

【AAAI 2020】InteractE: 通过增加特征交互来改进基于卷积的知识图谱嵌入， InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

专知会员服务

53+阅读 · 2020年6月7日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【图机器学习论文】基于深度学习的网络生物学（Deep Learning for Network Biology）

【图机器学习论文】基于深度学习的网络生物学（Deep Learning for Network Biology）

专知会员服务

11+阅读 · 2019年12月16日

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

【AAAI2020】实体关系联合抽取的编码器-解码器结构的有效建模（ Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction）

专知会员服务

53+阅读 · 2019年11月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

YOLOv5数据集结构解析&如何制作一个获得更好训练效果的数据集｜YOLOv5全面解析教程（二）

YOLOv5数据集结构解析&如何制作一个获得更好训练效果的数据集｜YOLOv5全面解析教程（二）

极市平台

3+阅读 · 2022年11月14日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

基于深度神经网络的关键词提取，Keywords extraction with DNN

基于深度神经网络的关键词提取，Keywords extraction with DNN

专知

10+阅读 · 2020年5月7日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

专知

18+阅读 · 2018年2月22日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

相关论文

Deepfake Text Detection in the Wild

Arxiv

0+阅读 · 2023年5月22日

Enhancing Next Active Object-based Egocentric Action Anticipation with Guided Attention

Arxiv

0+阅读 · 2023年5月22日

Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

Arxiv

0+阅读 · 2023年5月21日

UniEX: An Effective and Efficient Framework for Unified Information Extraction via a Span-extractive Perspective

Arxiv

0+阅读 · 2023年5月19日

Expanding the Role of Affective Phenomena in Multimodal Interaction Research

Arxiv

0+阅读 · 2023年5月18日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

Time-Series Event Prediction with Evolutionary State Graph

Arxiv

14+阅读 · 2020年11月25日

A 3D Coarse-to-Fine Framework for Volumetric Medical Image Segmentation

A 3D Coarse-to-Fine Framework for Volumetric Medical Image Segmentation

Arxiv

15+阅读 · 2018年8月2日

Contextual and Position-Aware Factorization Machines for Sentiment Classification

Arxiv

13+阅读 · 2018年1月18日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

相关基金

孤独症儿童早期干预的同步TMS-EEG研究

国家自然科学基金

0+阅读 · 2017年12月31日

启动子与操作子作为基本生物元件的模块化设计

国家自然科学基金

0+阅读 · 2014年12月31日

基于计算量子化学和分子动力学模拟方法以SHP-2为靶分子研究雄黄抗肿瘤的药效物质和作用机理

国家自然科学基金

0+阅读 · 2013年12月31日

基于Linked Open Data的Web服务语义互操作关键技术

国家自然科学基金

0+阅读 · 2012年12月31日

基于机器学习的蛋白质相互作用与功能预测方法研究

国家自然科学基金

2+阅读 · 2011年12月31日

贵金属纳米粒子与蛋白质相互作用的热力学和谱学研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于代谢组学的中药质量控制方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

磷酸化修饰介导的蛋白质相互作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于多样化特征表达的生物文献自动分类研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于结构域组成变换的蛋白质相互作用预测方法研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员