非参数贝叶斯框架下的因子混合分析器的理论与应用研究

项目名称： 非参数贝叶斯框架下的因子混合分析器的理论与应用研究

项目编号： No.61201326

项目类型： 青年科学基金项目

立项/批准年度： 2013

项目学科： 电子学与信息系统

项目作者： 魏昕

作者单位： 南京邮电大学

项目金额： 27万元

中文摘要： 随着采集和存储技术的发展，高维观测数据、高维特征矢量不断涌现，在提供更多信息的同时也不可避免地对处理工具和方法提出了严峻的挑战。因子混合分析器是近期提出的在完成数据处理任务的同时实现数据降维的典型工具，然而现有的基于最大似然准则的因子混合分析器无论在模型结构还是参数上都存在着局限性。为此，本项目首先将在非参数贝叶斯框架下对因子混合分析器进行理论研究，主要包括模型的建立和与模型相关的学习算法的推导。在理论研究完成后，拟将得到的非参数贝叶斯因子混合分析器嵌入隐马尔可夫模型中，并将其应用于当前语音信号处理中的热点问题- - 说话人分割聚类。本项目的理论研究提出的非参数贝叶斯因子混合分析器可望根据数据自动确定合适的模型结构和参数分布，从而更精确更灵活地处理高维数据；其应用研究成果有助于进一步提高说话人分割聚类系统的性能，并且为解决信息科学中其它相关的应用问题提供了新的思路和方法。

中文关键词： 因子混合分析器；非参数贝叶斯；分布式估计；说话人分割；

英文摘要： With the development of acquisition and storage techologies, high-dimensional observed data and high-dimensional feature vectors spring up. Though these high-dimensional data can provide more information than before, they inevitably pose serious challenges to existing processing tools and approaches. The factor mixture analyzer (FMA) is a representative tool which has been recently proposed to perform data processing and dimension reduction simultaneously. However, the model and the related parameter estimation algorithm of the FMA are based on the maximal likelihood criterion, having limitations in the model structure and parameters. Therefore, in this proposal, first, we will research the FMA from the persepective of Bayesian nonparametrics, which contains the establishment of two new models (BFMA, BtFMA) and the derivations of the related inference algorithms. After finishing the theoretical research, we will combine the BFMA/BtFMA to the hidden Markov models and apply the corresponding approaches to speaker diarization, which is one of the most important problems in the domain of speech signal processing. From the research of this proposal, the proposed BFMA and BtFMA will automatically determine appropriate structure and posteriors of parameters. They will provide new tools to processing high-dimensional da

英文关键词： factor mixture analyzer；nonparametric Bayesian；distributed estimation；speaker diarization；

成为VIP会员查看完整内容