通过综合山脊回归和随机森林进行绝对基因组数据分析时的变量选择和缺失的数据估算 (variable selection and missing data imputation in categorical genomic data analysis by integrated ridge regression and random forest) - 专知论文

会员服务 ·

0

数据填补 · 岭回归 · Integration · 随机森林 · 缺失值 ·

2021 年 11 月 10 日

variable selection and missing data imputation in categorical genomic data analysis by integrated ridge regression and random forest

翻译：通过综合山脊回归和随机森林进行绝对基因组数据分析时的变量选择和缺失的数据估算

Siru Wang,Guoqi Qian

Genomic data arising from a genome-wide association study (GWAS) are often not only of large-scale, but also incomplete. A specific form of their incompleteness is missing values with non-ignorable missingness mechanism. The intrinsic complications of genomic data present significant challenges in developing an unbiased and informative procedure of phenotype-genotype association analysis by a statistical variable selection approach. In this paper we develop a coherent procedure of categorical phenotype-genotype association analysis, in the presence of missing values with non-ignorable missingness mechanism in GWAS data, by integrating the state-of-the-art methods of random forest for variable selection, weighted ridge regression with EM algorithm for missing data imputation, and linear statistical hypothesis testing for determining the missingness mechanism. Two simulated GWAS are used to validate the performance of the proposed procedure. The procedure is then applied to analyze a real data set from breast cancer GWAS.

翻译：基因组学研究(GWAS)产生的基因组学数据往往不仅大范围,而且不完整,其不完备的具体形式是缺少与不可忽略的缺失机制有关的数值;基因组学数据固有的复杂因素对通过统计变量选择方法制定无偏见和知情的苯型基因类协会分析程序提出了重大挑战;在本文件中,我们开发了一个一致的绝对苯型基因类协会分析程序,在GWAS数据中缺少的数值与不可忽略的缺失机制存在缺失的情况下,通过将随机森林的最新方法纳入变量选择、加权脊柱回归和缺失数据估算方法的EM算法,以及确定缺失机制的线性统计假设测试。两个模拟的GWAS用于验证拟议程序的绩效,然后用于分析乳腺癌GWAS的一套真实数据。

1

相关内容

数据填补

云计算原理与技术，57页pdf

专知会员服务

74+阅读 · 2021年10月10日

概率主题模型综述

专知会员服务

36+阅读 · 2021年6月16日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

多标签学习的新趋势（2020 Survey）

多标签学习的新趋势（2020 Survey）

专知会员服务

44+阅读 · 2020年12月6日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

已删除

将门创投

3+阅读 · 2018年8月21日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

The Effect of Sample Size and Missingness on Inference with Missing Data

Arxiv

0+阅读 · 2022年1月13日

Improved Multi-objective Data Stream Clustering with Time and Memory Optimization

Improved Multi-objective Data Stream Clustering with Time and Memory Optimization

Arxiv

0+阅读 · 2022年1月13日

Efficient and Accurate Adaptive Resolution for Weakly-Compressible SPH

Arxiv

0+阅读 · 2022年1月13日

An adaptive finite element method for high-frequency scattering problems with smoothly varying coefficients

Arxiv

0+阅读 · 2022年1月12日

A Time Series Graph Cut Image Segmentation Scheme for Liver Tumors

A Time Series Graph Cut Image Segmentation Scheme for Liver Tumors

Arxiv

4+阅读 · 2018年9月13日

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

Arxiv

6+阅读 · 2018年9月13日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

IEOPF: An Active Contour Model for Image Segmentation with Inhomogeneities Estimated by Orthogonal Primary Functions

Arxiv

10+阅读 · 2018年1月20日

Integrating semi-supervised label propagation and random forests for multi-atlas based hippocampus segmentation

Arxiv

3+阅读 · 2017年12月31日

A three domain covariance framework for EEG/MEG data

Arxiv

3+阅读 · 2014年10月9日

VIP会员

文章信息

相关主题

相关VIP内容

云计算原理与技术，57页pdf

专知会员服务

74+阅读 · 2021年10月10日

概率主题模型综述

专知会员服务

36+阅读 · 2021年6月16日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

多标签学习的新趋势（2020 Survey）

多标签学习的新趋势（2020 Survey）

专知会员服务

44+阅读 · 2020年12月6日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多域作战兵棋推演：运用形态学分析与人工智能加强国防人员训练》

《采用智能弹药的仿生无人机蜂群实施目标压制》

仿生机器人技术的军事应用

《反集群作战：基于深度学习的分布式决策方法》89页

相关资讯

已删除

将门创投

3+阅读 · 2018年8月21日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

相关论文

The Effect of Sample Size and Missingness on Inference with Missing Data

Arxiv

0+阅读 · 2022年1月13日

Improved Multi-objective Data Stream Clustering with Time and Memory Optimization

Improved Multi-objective Data Stream Clustering with Time and Memory Optimization

Arxiv

0+阅读 · 2022年1月13日

Efficient and Accurate Adaptive Resolution for Weakly-Compressible SPH

Arxiv

0+阅读 · 2022年1月13日

An adaptive finite element method for high-frequency scattering problems with smoothly varying coefficients

Arxiv

0+阅读 · 2022年1月12日

A Time Series Graph Cut Image Segmentation Scheme for Liver Tumors

A Time Series Graph Cut Image Segmentation Scheme for Liver Tumors

Arxiv

4+阅读 · 2018年9月13日

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

Arxiv

6+阅读 · 2018年9月13日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

IEOPF: An Active Contour Model for Image Segmentation with Inhomogeneities Estimated by Orthogonal Primary Functions

Arxiv

10+阅读 · 2018年1月20日

Integrating semi-supervised label propagation and random forests for multi-atlas based hippocampus segmentation

Arxiv

3+阅读 · 2017年12月31日

A three domain covariance framework for EEG/MEG data

Arxiv

3+阅读 · 2014年10月9日

微信扫码咨询专知VIP会员