项目名称: 协变量随机缺失和有测量误差数据下影响诊断精度的半参数模型研究
项目编号: No.11501472
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 数理科学和化学
项目作者: 杨宝莹
作者单位: 西南交通大学
项目金额: 18万元
中文摘要: ROC曲线作为评价诊断测试准确度的一种综合方法,已被广泛应用到临床医学等领域。ROC曲线下的面积AUC是衡量诊断精度的综合单值指标。实际中,影响诊断精度(AUC指标)的因素有很多,但有些因素其影响并不显著,若将其纳入模型中进行研究会降低估计的有效性,影响模型的预测能力,因此变量选择很有必要。本项目将分别在完整数据、协变量随机缺失和协变量有测量误差等复杂数据类型下,通过广义部分线性变系数模型研究协变量对诊断精度的影响,提出更有效的模型估计方法,通过理论证明、数值模拟验证新方法的有效性;探索合适的变量选择方法,挑选影响AUC估计的变量子集,使AUC估计的均方误差达到最小,并给出AUC指标的稳健估计;进一步对所提方法进行实例分析。本项目的研究丰富和发展了复杂数据下的ROC分析方法,为临床医学等诊断测试相关领域提供理论依据和技术支撑。
中文关键词: 非参数函数估计;经验似然方法;FIC准则;复杂数据;变量选择
英文摘要: As a well-accepted technique for assessing the accuracy of diagnostic test, ROC curve has been widely applied to various fields such as clinical trial study. The area under the ROC curve, AUC, is a popular one number summary index of the discriminatory accuracy of a diagnostic test. In real data analysis, there will be a lot of covariates which may affect the discriminatory accuracy. However, not every covariate is important. Including all the available covariates may reduce model’s explainability. It is necessary to select out the important ones. We will study the generalized varying coefficient partially linear model with the complete data set, covariates missing at random data set and error in covariates data set, respectively, estimate the unknown parameters and functions based on some more effective method. The efficient of the proposed methods will be illustrated through the large sample theory and simulation studies. Furthermore, to select out the covariates which may affect the estimate of AUC index, a more suitable variable selection criteria will be developed, such that the AUC estimator has the minimal mean square error. A robust estimator of AUC is obtained. The proposed methods are illustrated through the real data analysis. Our study will develop the ROC analysis methods under the complex data sets, and provide theoretical basis and technical support for the fields related to the diagnostic tests, such as clinical trial study.
英文关键词: Nonparametric Function Estimation;Empirical Likelihood Method;Focused Information Criteria;Complex Data Sets;Variable Selection