项目名称: 基于标点信息和树形结构的汉语韵律结构研究
项目编号: No.61005053
项目类型: 青年科学基金项目
立项/批准年度: 2011
项目学科: 轻工业、手工业
项目作者: 钱揖丽
作者单位: 山西大学
项目金额: 7万元
中文摘要: 韵律在语言表达中占有重要的地位。目前,计算机自动合成语音的自然度不够理想,机器味较浓、节奏感较差,其主要缺陷就在于韵律方面。因此,研究韵律的恰当切分,正确把握话语的韵律结构,是提高计算机自动合成语音自然度的关键,也是实现人机对话和人工智能的前提,具有重大的现实意义。在国内外的相关研究中,为了得到较大规模标注了韵律结构的文本语料开展工作,人们一般都采用手工标注的方法获取。这不仅需要耗费大量的人力和时间,而且由于主要依靠主观感觉,标注过程难以规范,标注过程和结果还容易受到标注者主观知识的影响。针对上述问题,本课题探讨一种无需韵律标注语料的汉语韵律结构自动切分方法。主要研究内容包括以下几个方面:利用中文文本中的标点符号信息获取韵律结构信息;利用标点符号位置模拟韵律结构边界;将线性排列的汉语句子表示为树形结构的方法;利用标点信息和树形表示结构实现韵律结构的自动预测。
中文关键词: 标点;树结构;韵律结构;中文信息处理
英文摘要: The rhythm plays an important role in language expression. At present, the naturalness of computer synthesized speech is unsatisfactory and not ideal. It is machinery and poor in sense of rhythm, and the major drawback lies in the rhythm. Therefore, the key to strengthening rhythm and improving naturalness of synthesized speech lies in realizing correct prosodic segmentation and grasping correct prosodic structure of speech. It is the premise of man-machine dialogue and artificial intelligence, and it has great practical significance. In the related studies, researchers always label prosodic structures for large-scale corpus manually and then carry out work base this. On the one hand, manual method requires a lot of manpower and time consuming; on the other hand, the process and the results of it are easily affected by people's subjective knowledge. Aiming at these problems, this topic discusses an automatic segmentation method of Chinese prosodic structure without rhythm-labeled corpus. Main research contents of the project include the following aspects: the prosodic information acquisition using punctuation marks in Chinese texts; the simulation of the prosodic structure boundaries based on the locations of punctuation marks; the conversion method of sentence from linear structure to tree structure; and the automatic prediction of prosodic structure using punctuation information and tree structure.
英文关键词: punctuation marks; tree structure; prosodic structure; Chinese information processing