项目名称: 超高维数据分析的确定独立扫描方法:统计理论及其应用
项目编号: No.11301435
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 数理科学和化学
项目作者: 钟威
作者单位: 厦门大学
项目金额: 23万元
中文摘要: 随着现代信息技术的迅速发展,研究人员能有效地搜集到超高维数据。如何从复杂的超高维数据中提取有用信息,已经成为国际科学领域的研究热点,同时也给统计学研究带来了新的挑战和机遇。超高维数据中的自变量维数往往远远大于样本个数,从而传统的变量选择方法和高维数据惩罚回归方法不再适用。本项目旨在研究从超高维数据中选择重要变量的确定独立扫描方法及其理论和应用。首先,针对因变量是属性变量的超高维数据,提出基于自变量条件分布函数的平均方差的全新确定独立扫描方法并研究其理论性质,弥补了现有文献对超高维属性数据研究的空缺;其次,针对具有离群值和异方差的超高维数据,提出基于距离相关系数的稳健确定独立扫描方法,该方法从应用上可以提高现有方法的稳健性,从理论上能去除现有方法对于变量分布假设条件的依赖;最后,将这些方法应用到生物遗传学中的超高维基因数据,为其提供一种筛选影响某种遗传性状或疾病的重要基因的分析工具。
中文关键词: 累积分布函数;确定独立扫描;确定扫描性质;超高维;变量选择
英文摘要: With the advent of modern technology for data collection, researchers are able to collect ultrahigh dimensional data effectively in diverse fields of scientific research. How to extract useful information from complex ultrahigh dimensional data has become an interesting research topic, meanwhile it brings a new challenge as well as a chance to statistical research. In ultrahigh dimensional data, the number of predictors greatly exceeds the sample size, thus making traditional variable selection techniques and high dimensional penalized regression approaches practically unfeasible. This project aims to explore new sure independence screening approaches on how to select important variables for ultrahigh dimensional data, as well as their theoretical properties and applications. First, we propose a novel model-free sure independence screening procedure based on the mean variance of conditional distribution function (MV-SIS) for ultrahigh dimensional data analysis when response is categorical, and establish its theoretic properties, which will contribute to the literature on ultrahigh dimensional categorical data. Second, a new robust sure independence screening via distance correlation (DC-RoSIS) is proposed to enhance the robustness of the existing DC-SIS approach. This method is practically robust for ultrahigh d
英文关键词: Cumulative distribution function;sure independence screening;sure screening property;ultrahigh dimensionality;variable selection