项目名称: 确定性表达式及其子类的理论问题与工具研究
项目编号: No.61472405
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 陈海明
作者单位: 中国科学院软件研究所
项目金额: 80万元
中文摘要: 确定性表达式在XML以及其它领域有着很多应用,然而相关的理论研究目前还远不能满足实际工作的需要。本项目将对其中迫切需要解决的问题进行深入、系统的研究。目前使用确定性表达式的最主要的困难在于这类表达式没有语法定义,这已成为制约确定性表达式应用以及在实际当中使用不符合规范的DTDs和XML Schemas的重要因素。另外在实际中,还经常使用确定性表达式的一些子类。因此,本项目研究确定性表达式及其子类的语法表示问题,和基于语法的确定性表达式生成工具。对于模式缺失的情况,则需要研究确定性表达式及子类的推断问题。对于确定性语言研究中极为重要的若干判定问题,我们将进行理论与算法研究,希望得到新的复杂度结论和算法。对于强确定性表达式,则希望得到线性判定算法,并通过建立确定性与强确定性表达式之间的联系,得到带数字出现的确定性表达式的自动机刻画。这些研究结果,将对确定性表达式的更好应用产生积极而深远的影响。
中文关键词: 确定性表达式;子类;XML;可判定性;算法
英文摘要: Deterministic regular expressions are widely used in XML and other fields. However the related theoretical studies for this kind of expressions currently cannot meet the need of practical work. This project will study several most important problems in the theoretical studies for deterministic expressions. Presently the most difficulty in using this kind of expressions is that they do not have a syntax definition. This has become an important reason for restricting the use of deterministic expressions, and also a reason for using DTDs and XML Schemas that do not conform specifications in practice. Meanwhile, most of DTDs and XML Schemas use subclasses of deterministic expressions, such as SOREs and CHAREs. Hence the main objectives of this project are to develop syntax definitions for deterministic expressions and their subclasses, and study tools for automatically generating deterministic expressions based on these definitions. For the situation where there is not any schema for the XML files, we will study how to infer deterministic expressions or their subclasses for these files. For some important decision problems in deterministic languages, we will try to investigate theoretical properties and develop new algorithms, and hope to get some computational complexity results. For strongly deterministic regular expressions, we want to establish the relations between strongly deterministic expressions and deterministic expressions. By using these relations, we intend to design a linear time algorithm for deciding strong determinism of regular expressions, and find automata whose determinism will indicates determinism of regular expressions with counting. All these theoretical results will greatly promote the use of deterministic regular expressions in practice.
英文关键词: Deterministic regular expressions;subclasses;XML;decidability;algorithms