WebIE: Faithful and Robust Information Extraction on the Web - 专知论文

会员服务 ·

0

INFORMS · 信息抽取 · WEB · MoDELS · Better ·

2023 年 5 月 23 日

WebIE: Faithful and Robust Information Extraction on the Web

翻译：暂无翻译

Chenxi Whitehouse,Clara Vania,Alham Fikri Aji,Christos Christodoulopoulos,Andrea Pierleoni

from arxiv, ACL 2023 Main Conference

Extracting structured and grounded fact triples from raw text is a fundamental task in Information Extraction (IE). Existing IE datasets are typically collected from Wikipedia articles, using hyperlinks to link entities to the Wikidata knowledge base. However, models trained only on Wikipedia have limitations when applied to web domains, which often contain noisy text or text that does not have any factual information. We present WebIE, the first large-scale, entity-linked closed IE dataset consisting of 1.6M sentences automatically collected from the English Common Crawl corpus. WebIE also includes negative examples, i.e. sentences without fact triples, to better reflect the data on the web. We annotate ~25K triples from WebIE through crowdsourcing and introduce mWebIE, a translation of the annotated set in four other languages: French, Spanish, Portuguese, and Hindi. We evaluate the in-domain, out-of-domain, and zero-shot cross-lingual performance of generative IE models and find models trained on WebIE show better generalisability. We also propose three training strategies that use entity linking as an auxiliary task. Our experiments show that adding Entity-Linking objectives improves the faithfulness of our generative IE models.

翻译：暂无翻译

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

专知

17+阅读 · 2018年6月7日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

植物单宁影响布氏田鼠取食行为的神经机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于用户标签软约束话题模型的微博资源建模研究

国家自然科学基金

0+阅读 · 2013年12月31日

流感病毒NS1蛋白抑制干扰素刺激基因表达及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

饥饿胁迫下龙眼乙烯和H2O2信号发生及其在脱落调节上的互作

国家自然科学基金

0+阅读 · 2011年12月31日

《计算机研究与发展》学术期刊

国家自然科学基金

1+阅读 · 2011年12月31日

miR-96参与老年性聋小鼠DBA/2J毛细胞凋亡的机制

国家自然科学基金

0+阅读 · 2011年12月31日

模-相对Hochschild同调与上同调

国家自然科学基金

0+阅读 · 2011年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

有机光伏电池中异质结微结构调控及光伏性能的优化研究

国家自然科学基金

0+阅读 · 2008年12月31日

Faithful Low-Resource Data-to-Text Generation through Cycle Training

Faithful Low-Resource Data-to-Text Generation through Cycle Training

Arxiv

0+阅读 · 2023年7月11日

GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts

Arxiv

0+阅读 · 2023年7月11日

Inductive Relation Prediction from Relational Paths and Context with Hierarchical Transformers

Arxiv

0+阅读 · 2023年7月10日

Extracting Multi-valued Relations from Language Models

Arxiv

0+阅读 · 2023年7月7日

A Survey on Neural Recommendation: From Collaborative Filtering to Content and Context Enriched Recommendation

Arxiv

25+阅读 · 2021年4月27日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

CoDEx: A Comprehensive Knowledge Graph Completion Benchmark

Arxiv

10+阅读 · 2020年10月6日

Entity Context and Relational Paths for Knowledge Graph Completion

Arxiv

29+阅读 · 2020年2月17日

Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

Arxiv

40+阅读 · 2019年6月4日

Explainable Reasoning over Knowledge Graphs for Recommendation

Arxiv

11+阅读 · 2018年11月12日

VIP会员

文章信息

相关主题

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICCV2025教程】基础模型遇见具身智能体

军事机器学习设计：关于开发自动化任务摘要系统的梯次化设计科学研究 | 2025最新93页

扩散模型中的缓存方法综述：迈向高效的多模态生成

【ICCV2025教程】《迈向视觉语言模型的全面推理》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

专知

17+阅读 · 2018年6月7日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

【论文推荐】最新5篇图像分割相关论文—条件随机场和深度特征学习、移动端网络、长期视觉定位、主动学习、主动轮廓模型、生成对抗性网络

专知

13+阅读 · 2018年1月23日

相关论文

Faithful Low-Resource Data-to-Text Generation through Cycle Training

Faithful Low-Resource Data-to-Text Generation through Cycle Training

Arxiv

0+阅读 · 2023年7月11日

GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts

Arxiv

0+阅读 · 2023年7月11日

Inductive Relation Prediction from Relational Paths and Context with Hierarchical Transformers

Arxiv

0+阅读 · 2023年7月10日

Extracting Multi-valued Relations from Language Models

Arxiv

0+阅读 · 2023年7月7日

A Survey on Neural Recommendation: From Collaborative Filtering to Content and Context Enriched Recommendation

Arxiv

25+阅读 · 2021年4月27日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

CoDEx: A Comprehensive Knowledge Graph Completion Benchmark

Arxiv

10+阅读 · 2020年10月6日

Entity Context and Relational Paths for Knowledge Graph Completion

Arxiv

29+阅读 · 2020年2月17日

Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

Arxiv

40+阅读 · 2019年6月4日

Explainable Reasoning over Knowledge Graphs for Recommendation

Arxiv

11+阅读 · 2018年11月12日

相关基金

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

植物单宁影响布氏田鼠取食行为的神经机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于用户标签软约束话题模型的微博资源建模研究

国家自然科学基金

0+阅读 · 2013年12月31日

流感病毒NS1蛋白抑制干扰素刺激基因表达及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

饥饿胁迫下龙眼乙烯和H2O2信号发生及其在脱落调节上的互作

国家自然科学基金

0+阅读 · 2011年12月31日

《计算机研究与发展》学术期刊

国家自然科学基金

1+阅读 · 2011年12月31日

miR-96参与老年性聋小鼠DBA/2J毛细胞凋亡的机制

国家自然科学基金

0+阅读 · 2011年12月31日

模-相对Hochschild同调与上同调

国家自然科学基金

0+阅读 · 2011年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

有机光伏电池中异质结微结构调控及光伏性能的优化研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员