基于图示的模拟抽取文本分析 (Graph-based Semantical Extractive Text Analysis) - 专知论文

会员服务 ·

0

语义相似度 · Analysis · 相似度 · Extensibility · search engine ·

2022 年 12 月 19 日

Graph-based Semantical Extractive Text Analysis

翻译：基于图示的模拟抽取文本分析

In the past few decades, there has been an explosion in the amount of available data produced from various sources with different topics. The availability of this enormous data necessitates us to adopt effective computational tools to explore the data. This leads to an intense growing interest in the research community to develop computational methods focused on processing this text data. A line of study focused on condensing the text so that we are able to get a higher level of understanding in a shorter time. The two important tasks to do this are keyword extraction and text summarization. In keyword extraction, we are interested in finding the key important words from a text. This makes us familiar with the general topic of a text. In text summarization, we are interested in producing a short-length text which includes important information about the document. The TextRank algorithm, an unsupervised learning method that is an extension of the PageRank (algorithm which is the base algorithm of Google search engine for searching pages and ranking them) has shown its efficacy in large-scale text mining, especially for text summarization and keyword extraction. this algorithm can automatically extract the important parts of a text (keywords or sentences) and declare them as the result. However, this algorithm neglects the semantic similarity between the different parts. In this work, we improved the results of the TextRank algorithm by incorporating the semantic similarity between parts of the text. Aside from keyword extraction and text summarization, we develop a topic clustering algorithm based on our framework which can be used individually or as a part of generating the summary to overcome coverage problems.

翻译：在过去几十年中,从不同主题的不同来源产生的可用数据数量急剧增加。如此庞大的数据的提供使得我们不得不采用有效的计算工具来探索数据。这导致研究界对开发侧重于处理文本数据的计算方法的兴趣日益浓厚, 以开发侧重于处理文本数据的计算方法。一行研究的重点是压缩文本, 以便我们能够在更短的时间内获得更高程度的理解。这样做的两大任务是关键词提取和文本摘要化。在关键词提取中, 我们有兴趣从文本中找到关键的重要词。这使我们熟悉文本的一般主题。在文本摘要化中, 我们感兴趣的是生成一个包含文件重要信息的短长的文本。 TextRank 算法, 是一个非超强的学习方法, 是 PageRank (algorthm) 的扩展。这是谷歌搜索页面和排序的基算法。在大规模文本摘要挖掘中, 特别是文本摘要化和关键词提取中, 我们的算法可以自动地解析出文本中的重要部分, 也就是我们使用不同语言序列的结果。

0

相关内容

语义相似度

语义相似度

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

炎症作用下circ_0007986/miRNA调控食管癌细胞耐药促进肿瘤转移机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

基于毛细管液滴多重不对称MSP/阵列DNA纳米探针的微流控电化学肿瘤多基因甲基化传感新方法

国家自然科学基金

0+阅读 · 2014年12月31日

TGF-beta信号通路在小细胞肺癌转移中的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

带粗糙系数的高阶微分算子的若干研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

微通道中纳米颗粒的输运、沉积和吸附研究

国家自然科学基金

0+阅读 · 2012年12月31日

细胞内吞通路EGFR/Rab11FIP3/EHD1介导非小细胞肺癌多药耐药机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

亚微米及纳米颗粒两相湍流的研究

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

DREEAM: Guiding Attention with Evidence for Improving Document-Level Relation Extraction

Arxiv

0+阅读 · 2023年2月17日

Learning with Rejection for Abstractive Text Summarization

Arxiv

0+阅读 · 2023年2月16日

A Survey on Event-based News Narrative Extraction

Arxiv

0+阅读 · 2023年2月16日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Arxiv

36+阅读 · 2021年5月27日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Arxiv

17+阅读 · 2020年10月19日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

VIP会员

文章信息

相关主题

语义相似度

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NTU博士论文】反事实推理在多模态对话生成中的应用

基于强化学习的智能体化搜索全面综述：基础、角色、优化、评估与应用

ICCV最佳论文出炉，朱俊彦团队用砖块积木摘得桂冠

面向具身操作的高效视觉–语言–动作模型：系统综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

DREEAM: Guiding Attention with Evidence for Improving Document-Level Relation Extraction

Arxiv

0+阅读 · 2023年2月17日

Learning with Rejection for Abstractive Text Summarization

Arxiv

0+阅读 · 2023年2月16日

A Survey on Event-based News Narrative Extraction

Arxiv

0+阅读 · 2023年2月16日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Arxiv

36+阅读 · 2021年5月27日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Arxiv

17+阅读 · 2020年10月19日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

相关基金

炎症作用下circ_0007986/miRNA调控食管癌细胞耐药促进肿瘤转移机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

基于毛细管液滴多重不对称MSP/阵列DNA纳米探针的微流控电化学肿瘤多基因甲基化传感新方法

国家自然科学基金

0+阅读 · 2014年12月31日

TGF-beta信号通路在小细胞肺癌转移中的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

带粗糙系数的高阶微分算子的若干研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

微通道中纳米颗粒的输运、沉积和吸附研究

国家自然科学基金

0+阅读 · 2012年12月31日

细胞内吞通路EGFR/Rab11FIP3/EHD1介导非小细胞肺癌多药耐药机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

亚微米及纳米颗粒两相湍流的研究

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员