优化人类-海洋合作,以高效高精密高效率地从文本文件提取信息 (Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents) - 专知论文

会员服务 ·

0

INFORMS · 查准率/准确率 · 信息抽取 · Automator · Unstructured ·

2023 年 2 月 18 日

Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents

翻译：优化人类-海洋合作,以高效高精密高效率地从文本文件提取信息

Bradley Butcher,Miri Zilka,Darren Cook,Jiri Hron,Adrian Weller

While humans can extract information from unstructured text with high precision and recall, this is often too time-consuming to be practical. Automated approaches, on the other hand, produce nearly-immediate results, but may not be reliable enough for high-stakes applications where precision is essential. In this work, we consider the benefits and drawbacks of various human-only, human-machine, and machine-only information extraction approaches. We argue for the utility of a human-in-the-loop approach in applications where high precision is required, but purely manual extraction is infeasible. We present a framework and an accompanying tool for information extraction using weak-supervision labelling with human validation. We demonstrate our approach on three criminal justice datasets. We find that the combination of computer speed and human understanding yields precision comparable to manual annotation while requiring only a fraction of time, and significantly outperforms fully automated baselines in terms of precision.

翻译：虽然人类可以从非结构化文本中以高精确度和高回顾方式提取信息,但这往往太费时,不切实际。另一方面,自动化方法产生近乎即时的结果,但对于精确度至关重要的高摄入应用而言可能不够可靠。在这项工作中,我们考虑了各种单人、人机和机器信息提取方法的利弊。我们主张在需要高精确度但纯粹人工提取是行不通的应用程序中,采用人到流方法的效用。我们提出了一个框架和配套工具,用于使用微弱的监视标签进行信息提取。我们在三个刑事司法数据集上展示了我们的方法。我们发现,计算机速度和人类理解的结合使得精确度与手动说明相仿,同时只需要一点时间,在精确性方面大大超过完全自动化的基线。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

35+阅读 · 2022年3月5日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

空间分数阶质量守恒型Allen-Cahn方程的高效数值算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

认知天波超视距雷达低可探测目标检测方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

基于超声射频信号的女性盆底功能障碍定量评价方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

路面破损实时三维加速度振动监测和图像识别的复合处理方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

建筑工程质量管理规范建模与检索的本体化方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

汉越双语语料库建设及词对齐方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

催化型氮杂Wittig反应合成多取代杂环的新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Puma和Bim在慢性淋巴细胞白血病细胞凋亡中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Toll 样受体介导的巨噬细胞对prion清除的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

An Entity-based Claim Extraction Pipeline for Real-world Biomedical Fact-checking

Arxiv

0+阅读 · 2023年4月11日

Approximating Human Evaluation of Social Chatbots with Prompting

Arxiv

0+阅读 · 2023年4月11日

Convolutional Neural Networks for Epileptic Seizure Prediction

Arxiv

0+阅读 · 2023年4月11日

Demon in the machine: learning to extract work and absorb entropy from fluctuating nanosystems

Arxiv

0+阅读 · 2023年4月10日

Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: A Preliminary Empirical Study

Arxiv

0+阅读 · 2023年4月10日

Large language models effectively leverage document-level context for literary translation, but critical errors persist

Arxiv

0+阅读 · 2023年4月7日

From Retrieval to Generation: Efficient and Effective Entity Set Expansion

Arxiv

0+阅读 · 2023年4月7日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

35+阅读 · 2021年8月2日

A Survey on Data Augmentation for Text Classification

A Survey on Data Augmentation for Text Classification

Arxiv

16+阅读 · 2021年7月7日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

VIP会员

文章信息

相关主题

查准率/准确率

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

35+阅读 · 2022年3月5日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用大语言模型（LLM）优化海军陆战队经验教训学习》2025年最新103页

《加拿大陆军顶层作战概念》2025最新33页

超越第一人称视角（FPV）无人机：汲取俄乌战争的全部教训

《瓦洛伦斯（ValoRens）项目 - 预测分析：解读敌方意图》

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

An Entity-based Claim Extraction Pipeline for Real-world Biomedical Fact-checking

Arxiv

0+阅读 · 2023年4月11日

Approximating Human Evaluation of Social Chatbots with Prompting

Arxiv

0+阅读 · 2023年4月11日

Convolutional Neural Networks for Epileptic Seizure Prediction

Arxiv

0+阅读 · 2023年4月11日

Demon in the machine: learning to extract work and absorb entropy from fluctuating nanosystems

Arxiv

0+阅读 · 2023年4月10日

Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: A Preliminary Empirical Study

Arxiv

0+阅读 · 2023年4月10日

Large language models effectively leverage document-level context for literary translation, but critical errors persist

Arxiv

0+阅读 · 2023年4月7日

From Retrieval to Generation: Efficient and Effective Entity Set Expansion

Arxiv

0+阅读 · 2023年4月7日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

35+阅读 · 2021年8月2日

A Survey on Data Augmentation for Text Classification

A Survey on Data Augmentation for Text Classification

Arxiv

16+阅读 · 2021年7月7日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

相关基金

空间分数阶质量守恒型Allen-Cahn方程的高效数值算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

认知天波超视距雷达低可探测目标检测方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

基于超声射频信号的女性盆底功能障碍定量评价方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

路面破损实时三维加速度振动监测和图像识别的复合处理方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

建筑工程质量管理规范建模与检索的本体化方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

汉越双语语料库建设及词对齐方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

催化型氮杂Wittig反应合成多取代杂环的新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Puma和Bim在慢性淋巴细胞白血病细胞凋亡中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Toll 样受体介导的巨噬细胞对prion清除的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员