项目名称: 基于概率的名词性属性距离度量研究
项目编号: No.61203287
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 自动化学科
项目作者: 李超群
作者单位: 中国地质大学(武汉)
项目金额: 24万元
中文摘要: 距离度量是基于距离的机器学习算法的核心所在,很多距离相关的算法都依靠一个好的距离函数获得成功。而名词性属性距离度量相比数值属性距离度量更加复杂。本项目针对基于概率的名词性属性距离度量问题展开研究。研究内容包括:1)以朴素贝叶斯模型的属性独立假设为出发点,理论分析属性独立假设对距离函数的影响;2)借助贝叶斯网络和决策树模型来表达属性之间的依赖关系,并把表达的属性依赖关系引入距离函数中,构造新的距离函数,使之在具有强依赖关系的数据上表现出更好的性能;3)研究贝叶斯网络和决策树模型的类概率估测能力,甚至提出新的类概率估测模型,并利用其来计算基于概率的距离函数中的类成员概率,提高相关距离函数的性能。项目首次利用贝叶斯网络和决策树模型来研究距离度量问题,不仅可以为基于概率的名词性属性距离度量新方法研究提供示例,还可以推动基于概率的距离函数的应用,具有非常重要的理论意义和应用价值。
中文关键词: 距离度量;名词性属性;属性独立假设;属性依赖关系;类概率估测
英文摘要: Distance metrics play a key role for distance-related learning algorithms, and many distance-related learning algorithms depend on a good distance metric to be successful.Compared with distance metrics for numerical attributes, distance metrics for nominal attributes are not relatively wellunderstood. In this project, we work on probability-based distance metrics for nominal attributes. Main research contents include: 1) We take from the attribute independence assumption in the na?ve Bayes model and discuss the influnce of attribute independence assumption to the performance of distance metrics; 2) We investigate the attribute dependence relationships in Bayesian networks and decision tree models, and express the attribute dependence relationships in distance metrics to propose new distance metrics which will show good performance on those datas which have strong dependence relationships between attributes. 3) We study the class probability estimation ability of Bayesian networks and decision tree models, even propose new class probability estimation models, and apply them to calculate the class membership probability on the probabillity-based distance metrics, consequently improve the performance of relevant distance metrics. In this project, we firstly apply Bayesian networks and decision tree models to the st
英文关键词: distance metrics;nominal attributes;attribute independence assumption;attribute dependence relationship;class probability estimation