项目名称: 高维数据的假设检验
项目编号: No.11271031
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 数理科学和化学
项目作者: 王汉生
作者单位: 北京大学
项目金额: 50万元
中文摘要: 在实际研究工作中,人们碰到的高维数据越来越多,而相应的统计方法的发展相对滞后。在过去的5-10年里,高维数据的变量选择方法获得了很大的发展,而在假设检验方面却建树甚微。这恰恰是最近一两年刚刚兴起的研究前沿。本课题将根据现有的研究以及文献进展,在以下各个方面做重要且深入的贡献:(1)考虑在因子结构下的超高维假设检验问题,而现有的结果都假设无因子结构;(2)考虑高维部分检验(Partial Test),而现有的结果都只考虑了全局检验(Global Test);(3)考虑高维数据挖掘方法的假设检验(例如:Na?ve Bayes),而现有的结果只考虑了经典的回归或者多元模型;(4)考虑大规模网络数据(Network Data)的假设检验问题,而现有的结果都基于独立同分布假设。本课题的成果将极大地丰富并发展现有的高维假设检验理论。
中文关键词: 超高维数据;假设检验;因子模型;网络结构;朴素贝叶斯
英文摘要: In real practice, high dimensional data are becoming increasingly available. In contrast, relevant statistical methods are not well developed. During the past 5-10 years, much progress has been made for high dimensional variable selection methods. However, much less has been done for corresponding testing problems. This happens to be one of the most frontier research topics in the past one or two years. Based on the current research and literature progress, this study intends to make important and further contributions on the following perspectives: (1) ultra high dimensional data analysis with a factor structure (most current results are based on non-factor structures); (2) high dimensional partial tests (most existing methods are for global tests); (3) high dimensional data mining methods (past studies mainly considered classical regression or multivariate models); and (4) large scale network data (most existing literatures are based on independent assumptions). As a result, the outputs of this study would further enrich the theory of high dimensional testing substantially.
英文关键词: High Dimensional Data;Hypotheses Testing;Factor Model;Network Structure;Naive Bayes