项目名称: 引入二级结构信息的RNA序列快速比对
项目编号: No.61003168
项目类型: 青年科学基金项目
立项/批准年度: 2011
项目学科: 金属学与金属工艺
项目作者: 宋丹丹
作者单位: 北京理工大学
项目金额: 7万元
中文摘要: RNA二级结构信息的描述及预测是对RNA这一种类多样、功能重要的生物大分子进行快速比对,从而进行功能研究的基础。项目研究提出了基于条件随机场的RNA二级结构描述模型及预测方法,完成了问题理论性描述、数学模型建立、公式推导验证、实现程序编写,以及实验数据准备及验证工作。目前该部分成果正在整理待发表阶段。 同时,将本项目的理论基础应用到网页结构化数据分析研究中,提出了一种基于DOM树结构计算文本密度的网页核心内容块抽取算法,相关研究成果已被国际顶级学术会议ACM SIGIR 2011 (中国计算机学会推荐国际会议中信息检索领域唯一A类推荐,SCI检索,录用率19.8%,影响因子2.33)录用为长文,并申请了国内专利"基于DOM节点文本密度的网页核心块确定方法"。
中文关键词: 条件随机场;RNA二级结构;网页核心块抽取;DOM树;文本密度
英文摘要: Description and prediction of RNA secondary structures is the basis for rapid alignment and functional analysis of RNAs, as RNA is biological molecular with various types and significant functions. We proposed a Conditional Random Fields (CRF) based RNA secondary structure description and analysis method under the support of the project. The theoretical presentation of the problem, mathematical modeling, equation induction and verification, program coding, and experimental setup and validation are completed. Paper is prepared. In the meanwhile, we applied the theoretical basics of the project on the research of structured data extraction of web pages. A DOM based content extraction method via text density is proposed. The paper is published as a full paper on the ACM SIGIR 2011 Conference, which is the most top conference as suggested as Rank A by CCF, SCI indexed, with a accept rate 19.8%, impact factor 2.33). A domestic patent was applied, named "A Content Extraction Method Based on Text Density of DOM nodes".
英文关键词: Conditional Random Field (CRF); RNA secondary structure; Content Extraction; DOM tree; text density