SEAGLE: 生物库数据中基于大型设定的 GxE 测试的可缩放精确算法 (SEAGLE: A Scalable Exact Algorithm for Large-Scale Set-Based GxE Tests in Biobank Data) - 专知论文

会员服务 ·

0

确切的 · INTERACT · Extensibility · Continuity · Performer ·

2021 年 5 月 15 日

SEAGLE: A Scalable Exact Algorithm for Large-Scale Set-Based GxE Tests in Biobank Data

翻译：SEAGLE: 生物库数据中基于大型设定的 GxE 测试的可缩放精确算法

Jocelyn T. Chi,Ilse C. F. Ipsen,Tzu-Hung Hsiao,Ching-Heng Lin,Li-San Wang,Wan-Ping Lee,Tzu-Pin Lu,Jung-Ying Tzeng

The explosion of biobank data offers immediate opportunities for gene-environment (GxE) interaction studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in GxE assessment, especially for set-based GxE variance component (VC) tests, which are a widely used strategy to boost overall GxE signals and to evaluate the joint GxE effect of multiple variants from a biologically meaningful unit (e.g., gene). In this work, we focus on continuous traits and present SEAGLE, a Scalable Exact AlGorithm for Large-scale set-based GxE tests, to permit GxE VC tests for biobank-scale data. SEAGLE employs modern matrix computations to achieve the same "exact" results as the original GxE VC tests without imposing additional assumptions or relying on approximations. SEAGLE can easily accommodate sample sizes in the order of $10^5$, is implementable on standard laptops, and does not require specialized computing equipment. We demonstrate SEAGLE's performance through extensive simulations. We illustrate its utility by conducting genome-wide gene-based GxE analysis on the Taiwan Biobank data to explore the interaction of gene and physical activity status on body mass index.

翻译：生物库数据的爆炸为基因-环境(GxE)对复杂疾病进行互动研究提供了即时机会,因为样本规模大,遗传和非遗传信息收集量丰富。然而,极高的样本规模也给GxE评估带来了新的计算挑战,特别是用于基于定点的GxE差异部分(VC)测试,这是广泛使用的一种战略,用以提升GxE总体信号,并评价具有生物意义单位(例如基因)多种变体的GxE联合效应。在这项工作中,我们侧重于连续的特性,并展示SEAGLE, 大规模基于定点的GxE测试的可缩放的Exact AlGorithm, 以便允许GxE测试生物库数据。SEGLE使用现代矩阵计算方法,以取得与原GxEVC测试相同的“精确”结果,而不增加假设或依赖近似值。 SEAGLE可以很容易地适应标准笔电脑的样本大小,可以执行,并且不需要大规模进行基于SEAGEE的物理数据模拟,我们通过SEAGGBBBBBBBA展示了整个数据库的物理数据。

0

相关内容

确切的

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【WSDN 2020 论文】一种结构图表示学习框架（A Structural Graph Representation Learning Framework）

【WSDN 2020 论文】一种结构图表示学习框架（A Structural Graph Representation Learning Framework）

专知会员服务

74+阅读 · 2019年11月20日

【电子书推荐】机器学习导论Introduction to Machine Learning，斯坦福大学 | Nils J. Nilsson

【电子书推荐】机器学习导论Introduction to Machine Learning，斯坦福大学 | Nils J. Nilsson

专知会员服务

46+阅读 · 2019年11月19日

【MLA 2019】机器学习中分布式鲁棒优化的一阶算法框架( Towards a First-Order Algorithmic Framework for Distributionally Robust Optimization in Machine Learning),香港中文大学苏文藻

【MLA 2019】机器学习中分布式鲁棒优化的一阶算法框架( Towards a First-Order Algorithmic Framework for Distributionally Robust Optimization in Machine Learning),香港中文大学苏文藻

专知会员服务

28+阅读 · 2019年11月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

已删除

生物探索

3+阅读 · 2018年2月10日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Neuronized Priors for Bayesian Sparse Linear Regression

Arxiv

0+阅读 · 2021年7月6日

Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators with Massive Data

Arxiv

0+阅读 · 2021年7月5日

Near-linear convergence of the Random Osborne algorithm for Matrix Balancing

Arxiv

0+阅读 · 2021年7月2日

Bayesian two-interval test

Arxiv

0+阅读 · 2021年7月2日

On the complexity of binary polynomial optimization over acyclic hypergraphs

Arxiv

0+阅读 · 2021年7月2日

Generalized Multivariate Signs for Nonparametric Hypothesis Testing in High Dimensions

Arxiv

0+阅读 · 2021年7月2日

Two edge-count tests and relevance analysis in k high-dimensional samples

Arxiv

0+阅读 · 2021年7月1日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

A generic framework for privacy preserving deep learning

Arxiv

6+阅读 · 2018年11月13日

Learning to Speed Up Query Planning in Graph Databases

Arxiv

6+阅读 · 2018年1月21日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【WSDN 2020 论文】一种结构图表示学习框架（A Structural Graph Representation Learning Framework）

【WSDN 2020 论文】一种结构图表示学习框架（A Structural Graph Representation Learning Framework）

专知会员服务

74+阅读 · 2019年11月20日

【电子书推荐】机器学习导论Introduction to Machine Learning，斯坦福大学 | Nils J. Nilsson

【电子书推荐】机器学习导论Introduction to Machine Learning，斯坦福大学 | Nils J. Nilsson

专知会员服务

46+阅读 · 2019年11月19日

【MLA 2019】机器学习中分布式鲁棒优化的一阶算法框架( Towards a First-Order Algorithmic Framework for Distributionally Robust Optimization in Machine Learning),香港中文大学苏文藻

【MLA 2019】机器学习中分布式鲁棒优化的一阶算法框架( Towards a First-Order Algorithmic Framework for Distributionally Robust Optimization in Machine Learning),香港中文大学苏文藻

专知会员服务

28+阅读 · 2019年11月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型幻觉：系统综述

《分析与预测陆军战斗体能测试表现：统计与机器学习方法》2025最新137页

【博士论文】数据与任务的物理学：深度学习中的局部性与组合性理论

代理式人工智能时代的决策优势

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

已删除

生物探索

3+阅读 · 2018年2月10日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Neuronized Priors for Bayesian Sparse Linear Regression

Arxiv

0+阅读 · 2021年7月6日

Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators with Massive Data

Arxiv

0+阅读 · 2021年7月5日

Near-linear convergence of the Random Osborne algorithm for Matrix Balancing

Arxiv

0+阅读 · 2021年7月2日

Bayesian two-interval test

Arxiv

0+阅读 · 2021年7月2日

On the complexity of binary polynomial optimization over acyclic hypergraphs

Arxiv

0+阅读 · 2021年7月2日

Generalized Multivariate Signs for Nonparametric Hypothesis Testing in High Dimensions

Arxiv

0+阅读 · 2021年7月2日

Two edge-count tests and relevance analysis in k high-dimensional samples

Arxiv

0+阅读 · 2021年7月1日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

A generic framework for privacy preserving deep learning

Arxiv

6+阅读 · 2018年11月13日

Learning to Speed Up Query Planning in Graph Databases

Arxiv

6+阅读 · 2018年1月21日

微信扫码咨询专知VIP会员