不同数据源半自动数据提取系统:棉花工业案例研究 (A Semi-automatic Data Extraction System for Heterogeneous Data Sources: A Case Study from Cotton Industry) - 专知论文

会员服务 ·

0

INFORMS · CASE · MINE · 可辨认的 · 文本数据挖掘 ·

2021 年 11 月 5 日

A Semi-automatic Data Extraction System for Heterogeneous Data Sources: A Case Study from Cotton Industry

翻译：不同数据源半自动数据提取系统:棉花工业案例研究

Richi Nayak,Thirunavukarasu Balasubramaniam,Sangeetha Kutty,Sachindra Banduthilaka,Erin Peterson

from arxiv, Accepted in the 19th Australasian Data Mining Conference 2021

With the recent developments in digitisation, there are increasing number of documents available online. There are several information extraction tools that are available to extract information from digitised documents. However, identifying precise answers to a given query is often a challenging task especially if the data source where the relevant information resides is unknown. This situation becomes more complex when the data source is available in multiple formats such as PDF, table and html. In this paper, we propose a novel data extraction system to discover relevant and focused information from diverse unstructured data sources based on text mining approaches. We perform a qualitative analysis to evaluate the proposed system and its suitability and adaptability using cotton industry.

翻译：随着最近在数字化方面的发展,在线提供的文件越来越多,有若干信息提取工具可以从数字化文件中提取信息,然而,确定对特定查询的准确答案往往是一项具有挑战性的任务,特别是如果有关信息所在的数据源未知,当数据源以多种格式,如PDF、表格和html提供时,这种情况就变得更加复杂。在本文件中,我们提议建立一个新的数据提取系统,从基于文本开采方法的不同非结构化数据源中发现相关和有重点的信息。我们进行定性分析,评估拟议的系统及其使用棉花工业的适宜性和适应性。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

基于图的异常检测，94页ppt

专知会员服务

78+阅读 · 2021年9月27日

【因果人工智能系统】106页ppt，Causal AI for Systems

专知会员服务

98+阅读 · 2021年8月28日

SIGIR2021接受论文列表公布！151篇论文都在这了！

专知会员服务

38+阅读 · 2021年4月27日

【SIGIR2021】ScaleFreeCTR：超大规模Embedding推荐模型分布式训练系统

专知会员服务

28+阅读 · 2021年4月26日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

2012-2018-CS顶会历届最佳论文大列表

2012-2018-CS顶会历届最佳论文大列表

深度学习与NLP

6+阅读 · 2019年2月1日

自然语言处理常见数据集、论文最全整理分享

自然语言处理常见数据集、论文最全整理分享

深度学习与NLP

11+阅读 · 2019年1月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

《模式识别与机器学习(PRML)》正式开放免费下载

《模式识别与机器学习(PRML)》正式开放免费下载

AINLP

27+阅读 · 2018年11月27日

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

无人机

5+阅读 · 2018年10月4日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Understanding occupants' behaviour, engagement, emotion, and comfort indoors with heterogeneous sensors and wearables

Arxiv

0+阅读 · 2022年1月9日

Extraction of Product Specifications from the Web -- Going Beyond Tables and Lists

Arxiv

0+阅读 · 2022年1月8日

On clique numbers of colored mixed graphs

Arxiv

0+阅读 · 2022年1月6日

FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches

Arxiv

0+阅读 · 2022年1月4日

A Semi-Personalized System for User Cold Start Recommendation on Music Streaming Apps

Arxiv

11+阅读 · 2021年6月7日

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Arxiv

3+阅读 · 2019年2月1日

CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web

Arxiv

6+阅读 · 2018年4月12日

A Study of Recent Contributions on Information Extraction

Arxiv

6+阅读 · 2018年3月15日

Sequence-Aware Recommender Systems

Arxiv

8+阅读 · 2018年2月23日

Open Information Extraction on Scientific Text: An Evaluation

Arxiv

6+阅读 · 2018年2月15日

VIP会员

文章信息

相关主题

文本数据挖掘

相关VIP内容

基于图的异常检测，94页ppt

专知会员服务

78+阅读 · 2021年9月27日

【因果人工智能系统】106页ppt，Causal AI for Systems

专知会员服务

98+阅读 · 2021年8月28日

SIGIR2021接受论文列表公布！151篇论文都在这了！

专知会员服务

38+阅读 · 2021年4月27日

【SIGIR2021】ScaleFreeCTR：超大规模Embedding推荐模型分布式训练系统

专知会员服务

28+阅读 · 2021年4月26日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

2012-2018-CS顶会历届最佳论文大列表

2012-2018-CS顶会历届最佳论文大列表

深度学习与NLP

6+阅读 · 2019年2月1日

自然语言处理常见数据集、论文最全整理分享

自然语言处理常见数据集、论文最全整理分享

深度学习与NLP

11+阅读 · 2019年1月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

《模式识别与机器学习(PRML)》正式开放免费下载

《模式识别与机器学习(PRML)》正式开放免费下载

AINLP

27+阅读 · 2018年11月27日

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

Windows操作系统全面兼容机器人操作系统ROS1和ROS2

无人机

5+阅读 · 2018年10月4日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

相关论文

Understanding occupants' behaviour, engagement, emotion, and comfort indoors with heterogeneous sensors and wearables

Arxiv

0+阅读 · 2022年1月9日

Extraction of Product Specifications from the Web -- Going Beyond Tables and Lists

Arxiv

0+阅读 · 2022年1月8日

On clique numbers of colored mixed graphs

Arxiv

0+阅读 · 2022年1月6日

FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches

Arxiv

0+阅读 · 2022年1月4日

A Semi-Personalized System for User Cold Start Recommendation on Music Streaming Apps

Arxiv

11+阅读 · 2021年6月7日

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Arxiv

3+阅读 · 2019年2月1日

CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web

Arxiv

6+阅读 · 2018年4月12日

A Study of Recent Contributions on Information Extraction

Arxiv

6+阅读 · 2018年3月15日

Sequence-Aware Recommender Systems

Arxiv

8+阅读 · 2018年2月23日

Open Information Extraction on Scientific Text: An Evaluation

Arxiv

6+阅读 · 2018年2月15日

微信扫码咨询专知VIP会员