图书层次主题自动标引研究 - 专知基金

会员服务 ·

0

图书 · 主题抽取 ·

2013 年 12 月 31 日

图书层次主题自动标引研究

国家自然科学基金

国家自然科学基金委员会

项目名称： 图书层次主题自动标引研究

项目编号： No.71303089

项目类型： 青年科学基金项目

立项/批准年度： 2014

项目学科： 管理科学

项目作者： 陈静

作者单位： 华中师范大学

项目金额： 20万元

中文摘要： 随着电子图书信息资源的迅速增长，图书主题自动标引的粗粒度现状与信息用户需求的精细化趋势之间的矛盾日趋严重，进行图书层次主题自动标引是解决这一矛盾的有效方法。本项目在理论梳理与需求分析基础上，着力于构建图书层次主题自动标引模型及其方法体系，首先，设计图书目次识别算法，该算法融合机器学习及语义分析，从图书中提取目次特征与标记规则，接着，研制图书层次主题结构划分方法，利用目次识别和模糊检索划分出图书主题粗结构，利用层次主题模型和聚类分析，对图书主题粗结构划分得到的最小逻辑单元进行层次主题结构划分及主题标引，然后，通过基于概率主题模型的主题信息抽取方法，抽出图书主题粗结构中各逻辑单元的主题信息，实现图书层次主题自动标引，以细化图书信息研究粒度，拓展图书信息组织研究内容，推进图书信息资源管理与应用发展。

中文关键词： 层次主题；自动标引；图书；主题结构划分；主题抽取

英文摘要： With the rapid growth of electronic book information resources, the contradiction between coarse-granularity status of book topic indexing and fine-granularity trend of information users' needs becomes increasingly serious. Combining book topic structure partition and book hierarchy topics extraction to index book hierarchical topics(BHT) is an effective way to resolve the contradiction. On the basis of theoretical inspection and needs analysis, this project makes efforts to build an automatic indexing model for BHT and its methodologies with the help of artificial intelligence and data mining theories and methods. First, an algorithm combining machine-learning and semantic analysis for table of contents (TOC) recognizing is designed to mine characteristics and marking rules of TOC. Then, the structure of BHT is partitioned within two steps. The first step is book coarse structure partition following fuzzy retrieval model and results of TOC recognition, and the second step is that, by applying hierarchical topic model and clustering analysis, the lowest level text fragments from the former one are partitioned their hierarchical topics structure out and indexed. At last, topic extraction and indexing for book coarse structure are done with an algorithm based on probabilistic topic model. So, automatic indexing of

英文关键词： hierarchical topic；automatic indexing；book；topic structure partition；topic extraction

成为VIP会员查看完整内容

2

相关内容

图书

【AAAI 2022】299页PPT，NUS最全《自动合成》教程

【AAAI 2022】299页PPT，NUS最全《自动合成》教程

专知会员服务

20+阅读 · 2022年3月17日

工信部印发《“十四五”大数据产业发展规划》，20页pdf

工信部印发《“十四五”大数据产业发展规划》，20页pdf

专知会员服务

54+阅读 · 2021年12月2日

TKDE21 | 网络社团发现新综述：从统计建模到深度学习

TKDE21 | 网络社团发现新综述：从统计建模到深度学习

专知会员服务

28+阅读 · 2021年10月27日

概率主题模型综述

专知会员服务

36+阅读 · 2021年6月16日

电子病历文本挖掘研究综述

专知会员服务

73+阅读 · 2021年3月27日

【干货书】语义关系与深度学习，86页pdf

专知会员服务

64+阅读 · 2021年2月4日

面向网络空间安全情报的知识图谱综述

专知会员服务

117+阅读 · 2021年1月8日

最新《大数据时代事件预测》综述论文，40页pdf，Emory 大学

最新《大数据时代事件预测》综述论文，40页pdf，Emory 大学

专知会员服务

68+阅读 · 2020年7月21日

基于深度学习的主题模型研究，中文综述，29页pdf，计算机学报

基于深度学习的主题模型研究，中文综述，29页pdf，计算机学报

专知会员服务

124+阅读 · 2020年5月20日

面向司法案件的案情知识图谱自动构建

面向司法案件的案情知识图谱自动构建

专知会员服务

126+阅读 · 2020年4月17日

哈工大刘铭：开放式知识图谱的自动构建技术

哈工大刘铭：开放式知识图谱的自动构建技术

专知

2+阅读 · 2022年1月12日

工信部印发《“十四五”大数据产业发展规划》，20页pdf

工信部印发《“十四五”大数据产业发展规划》，20页pdf

专知

3+阅读 · 2021年12月2日

混合增强视觉认知架构及其关键技术进展

混合增强视觉认知架构及其关键技术进展

专知

2+阅读 · 2021年11月20日

图像描述生成研究进展

图像描述生成研究进展

专知

1+阅读 · 2021年3月29日

知识图谱的自动构建

知识图谱的自动构建

DataFunTalk

57+阅读 · 2019年12月9日

面向新闻媒体的命名实体识别技术

面向新闻媒体的命名实体识别技术

PaperWeekly

18+阅读 · 2019年4月17日

专栏 | NLP概述和文本自动分类算法详解

专栏 | NLP概述和文本自动分类算法详解

机器之心

12+阅读 · 2018年7月24日

【专知-Java Deeplearning4j深度学习教程06】用卷积神经网络CNN进行图像分类

【专知-Java Deeplearning4j深度学习教程06】用卷积神经网络CNN进行图像分类

专知

41+阅读 · 2017年10月18日

文本聚类：从非结构化数据快速获取见解

文本聚类：从非结构化数据快速获取见解

Datartisan数据工匠

15+阅读 · 2017年10月12日

一图了解人工智能知识体系大全-专知主题知识树人工智能可视化

一图了解人工智能知识体系大全-专知主题知识树人工智能可视化

专知

96+阅读 · 2017年9月18日

基于机器学习的人脑MR图像结构区域层次化自动标记方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于主题图的城市空间信息关联与知识整合研究

国家自然科学基金

3+阅读 · 2014年12月31日

结合深度学习与非参数先验的自动新闻事件提取与新闻主题建模技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

跨领域信息抽取方法及其在数字图书中的应用研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于动态数据挖掘的物流信息智能分析研究

国家自然科学基金

1+阅读 · 2012年12月31日

语义网络环境下数字图书馆资源多维度聚合与可视化研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向商务智能的思维主题发现

国家自然科学基金

0+阅读 · 2012年12月31日

基于hLDA层次主题模型的中文多文档摘要研究

国家自然科学基金

1+阅读 · 2012年12月31日

多源电子目录语义集成与个性化服务理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

面向开放领域的自动关系抽取技术研究

国家自然科学基金

5+阅读 · 2008年12月31日

Pre-training of Deep Protein Models with Molecular Dynamics Simulations for Drug Binding

Arxiv

1+阅读 · 2022年4月19日

Natural Language Descriptions of Deep Visual Features

Arxiv

0+阅读 · 2022年4月18日

Self-Similarity Priors: Neural Collages as Differentiable Fractal Representations

Arxiv

0+阅读 · 2022年4月15日

On the Role of Pre-trained Language Models in Word Ordering: A Case Study with BART

On the Role of Pre-trained Language Models in Word Ordering: A Case Study with BART

Arxiv

0+阅读 · 2022年4月15日

Exploiting Fine-grained Face Forgery Clues via Progressive Enhancement Learning

Arxiv

12+阅读 · 2021年12月28日

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

Arxiv

10+阅读 · 2021年10月4日

Subgraph Neural Networks

Arxiv

27+阅读 · 2020年6月19日

Domain Representation for Knowledge Graph Embedding

Domain Representation for Knowledge Graph Embedding

Arxiv

14+阅读 · 2019年9月11日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Arxiv

88+阅读 · 2019年3月27日

阅读: 0 点赞: 0

小贴士

登录享主题订阅及个性化推荐

相关主题

热门VIP内容

开通专知VIP会员享更多权益服务

从代码基础模型到智能体与应用：代码智能的全面综述与实践指南

《北约认知战概念报告》

【MIT博士论文】高效的视觉合成生成模型

美海军放弃星座级转而采用国家安全巡逻舰设计

相关VIP内容

【AAAI 2022】299页PPT，NUS最全《自动合成》教程

【AAAI 2022】299页PPT，NUS最全《自动合成》教程

专知会员服务

20+阅读 · 2022年3月17日

工信部印发《“十四五”大数据产业发展规划》，20页pdf

工信部印发《“十四五”大数据产业发展规划》，20页pdf

专知会员服务

54+阅读 · 2021年12月2日

TKDE21 | 网络社团发现新综述：从统计建模到深度学习

TKDE21 | 网络社团发现新综述：从统计建模到深度学习

专知会员服务

28+阅读 · 2021年10月27日

概率主题模型综述

专知会员服务

36+阅读 · 2021年6月16日

电子病历文本挖掘研究综述

专知会员服务

73+阅读 · 2021年3月27日

【干货书】语义关系与深度学习，86页pdf

专知会员服务

64+阅读 · 2021年2月4日

面向网络空间安全情报的知识图谱综述

专知会员服务

117+阅读 · 2021年1月8日

最新《大数据时代事件预测》综述论文，40页pdf，Emory 大学

最新《大数据时代事件预测》综述论文，40页pdf，Emory 大学

专知会员服务

68+阅读 · 2020年7月21日

基于深度学习的主题模型研究，中文综述，29页pdf，计算机学报

基于深度学习的主题模型研究，中文综述，29页pdf，计算机学报

专知会员服务

124+阅读 · 2020年5月20日

面向司法案件的案情知识图谱自动构建

面向司法案件的案情知识图谱自动构建

专知会员服务

126+阅读 · 2020年4月17日

相关资讯

哈工大刘铭：开放式知识图谱的自动构建技术

哈工大刘铭：开放式知识图谱的自动构建技术

专知

2+阅读 · 2022年1月12日

工信部印发《“十四五”大数据产业发展规划》，20页pdf

工信部印发《“十四五”大数据产业发展规划》，20页pdf

专知

3+阅读 · 2021年12月2日

混合增强视觉认知架构及其关键技术进展

混合增强视觉认知架构及其关键技术进展

专知

2+阅读 · 2021年11月20日

图像描述生成研究进展

图像描述生成研究进展

专知

1+阅读 · 2021年3月29日

知识图谱的自动构建

知识图谱的自动构建

DataFunTalk

57+阅读 · 2019年12月9日

面向新闻媒体的命名实体识别技术

面向新闻媒体的命名实体识别技术

PaperWeekly

18+阅读 · 2019年4月17日

专栏 | NLP概述和文本自动分类算法详解

专栏 | NLP概述和文本自动分类算法详解

机器之心

12+阅读 · 2018年7月24日

【专知-Java Deeplearning4j深度学习教程06】用卷积神经网络CNN进行图像分类

【专知-Java Deeplearning4j深度学习教程06】用卷积神经网络CNN进行图像分类

专知

41+阅读 · 2017年10月18日

文本聚类：从非结构化数据快速获取见解

文本聚类：从非结构化数据快速获取见解

Datartisan数据工匠

15+阅读 · 2017年10月12日

一图了解人工智能知识体系大全-专知主题知识树人工智能可视化

一图了解人工智能知识体系大全-专知主题知识树人工智能可视化

专知

96+阅读 · 2017年9月18日

相关基金

基于机器学习的人脑MR图像结构区域层次化自动标记方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于主题图的城市空间信息关联与知识整合研究

国家自然科学基金

3+阅读 · 2014年12月31日

结合深度学习与非参数先验的自动新闻事件提取与新闻主题建模技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

跨领域信息抽取方法及其在数字图书中的应用研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于动态数据挖掘的物流信息智能分析研究

国家自然科学基金

1+阅读 · 2012年12月31日

语义网络环境下数字图书馆资源多维度聚合与可视化研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向商务智能的思维主题发现

国家自然科学基金

0+阅读 · 2012年12月31日

基于hLDA层次主题模型的中文多文档摘要研究

国家自然科学基金

1+阅读 · 2012年12月31日

多源电子目录语义集成与个性化服务理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

面向开放领域的自动关系抽取技术研究

国家自然科学基金

5+阅读 · 2008年12月31日

相关论文

Pre-training of Deep Protein Models with Molecular Dynamics Simulations for Drug Binding

Arxiv

1+阅读 · 2022年4月19日

Natural Language Descriptions of Deep Visual Features

Arxiv

0+阅读 · 2022年4月18日

Self-Similarity Priors: Neural Collages as Differentiable Fractal Representations

Arxiv

0+阅读 · 2022年4月15日

On the Role of Pre-trained Language Models in Word Ordering: A Case Study with BART

On the Role of Pre-trained Language Models in Word Ordering: A Case Study with BART

Arxiv

0+阅读 · 2022年4月15日

Exploiting Fine-grained Face Forgery Clues via Progressive Enhancement Learning

Arxiv

12+阅读 · 2021年12月28日

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

Arxiv

10+阅读 · 2021年10月4日

Subgraph Neural Networks

Arxiv

27+阅读 · 2020年6月19日

Domain Representation for Knowledge Graph Embedding

Domain Representation for Knowledge Graph Embedding

Arxiv

14+阅读 · 2019年9月11日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Arxiv

88+阅读 · 2019年3月27日

微信扫码咨询专知VIP会员