使用 Bigsimr R 软件包模拟高维多变量数据 (Simulating High-Dimensional Multivariate Data using the bigsimr R Package) - 专知论文

会员服务 ·

0

相关系数 · 蒙特卡罗 · CC · 边缘分布 · Performer ·

2021 年 11 月 11 日

Simulating High-Dimensional Multivariate Data using the bigsimr R Package

翻译：使用 Bigsimr R 软件包模拟高维多变量数据

A. Grant Schissler,Edward J. Bedrick,Alexander D. Knudson,Tomasz J. Kozubowski,Tin Nguyen,Juli Petereit,Walter W. Piegorsch,Duc Tran

from arxiv, 22 pages, 10 figures, https://cran.r-project.org/web/packages/bigsimr/index.html

It is critical to accurately simulate data when employing Monte Carlo techniques and evaluating statistical methodology. Measurements are often correlated and high dimensional in this era of big data, such as data obtained in high-throughput biomedical experiments. Due to the computational complexity and a lack of user-friendly software available to simulate these massive multivariate constructions, researchers resort to simulation designs that posit independence or perform arbitrary data transformations. To close this gap, we developed the Bigsimr Julia package with R and Python interfaces. This paper focuses on the R interface. These packages empower high-dimensional random vector simulation with arbitrary marginal distributions and dependency via a Pearson, Spearman, or Kendall correlation matrix. bigsimr contains high-performance features, including multi-core and graphical-processing-unit-accelerated algorithms to estimate correlation and compute the nearest correlation matrix. Monte Carlo studies quantify the accuracy and scalability of our approach, up to $d=10,000$. We describe example workflows and apply to a high-dimensional data set -- RNA-sequencing data obtained from breast cancer tumor samples.

翻译：在使用蒙特卡洛技术和评估统计方法时,准确模拟数据至关重要。在这个大数据时代,计量往往是相关和高维的,例如高通量生物医学实验中获得的数据。由于计算的复杂性和缺乏可用于模拟这些大规模多变量构造的方便用户的软件,研究人员采用假设独立或任意进行数据转换的模拟设计。为了缩小这一差距,我们用R和Python界面开发了Bigsimr Julia软件包。本文侧重于R界面。这些软件包通过Pearson、Spearman或Kendall相关矩阵使高维随机矢量模拟具有任意边际分布和依赖性。大型模拟软件含有高性性能特征,包括多极和图形处理单位加速算法,以估计相关性和计算最近的相关矩阵。蒙特卡洛研究量化了我们方法的准确性和可扩展性,最高达1万美元。我们描述了各种工作流程,并适用于一套高度数据集 -- 从乳腺癌肿瘤样本中获得的RNA序列数据。

0

相关内容

相关系数

【干货书】机器人元素Elements of Robotics ，311页pdf

【干货书】机器人元素Elements of Robotics ，311页pdf

专知会员服务

38+阅读 · 2021年4月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知会员服务

78+阅读 · 2020年7月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【IPAM workshops】加州大学洛杉矶分校会议：Geometry and Learning from Data in 3D and Beyond，workshop Ⅲ：Geometry of Big Data

【IPAM workshops】加州大学洛杉矶分校会议：Geometry and Learning from Data in 3D and Beyond，workshop Ⅲ：Geometry of Big Data

专知会员服务

8+阅读 · 2019年11月10日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Github项目推荐 | pikepdf - Python的PDF读写库

Github项目推荐 | pikepdf - Python的PDF读写库

AI研习社

9+阅读 · 2019年3月29日

已删除

将门创投

7+阅读 · 2019年3月28日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

lightgbm algorithm case of kaggle（上）

lightgbm algorithm case of kaggle（上）

R语言中文社区

8+阅读 · 2018年3月20日

【泡泡一分钟】基于均值偏移聚类方法的3D点云配准算法（3dv-49）

【泡泡一分钟】基于均值偏移聚类方法的3D点云配准算法（3dv-49）

泡泡机器人SLAM

6+阅读 · 2018年2月28日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Concentration study of M-estimators using the influence function

Arxiv

0+阅读 · 2022年1月14日

A posteriori error analysis for a space-time parallel discretization of parabolic partial differential equations

Arxiv

0+阅读 · 2022年1月14日

Data Fusion with Latent Map Gaussian Processes

Arxiv

0+阅读 · 2022年1月13日

Deep Recursive Embedding for High-Dimensional Data

Arxiv

0+阅读 · 2022年1月13日

Geometric Conditions for the Discrepant Posterior Phenomenon and Connections to Simpson's Paradox

Arxiv

0+阅读 · 2022年1月12日

Pixel Level Data Augmentation for Semantic Image Segmentation using Generative Adversarial Networks

Pixel Level Data Augmentation for Semantic Image Segmentation using Generative Adversarial Networks

Arxiv

5+阅读 · 2019年2月8日

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Arxiv

11+阅读 · 2018年12月6日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Learning Image Conditioned Label Space for Multilabel Classification

Arxiv

5+阅读 · 2018年2月21日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】机器人元素Elements of Robotics ，311页pdf

【干货书】机器人元素Elements of Robotics ，311页pdf

专知会员服务

38+阅读 · 2021年4月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知会员服务

78+阅读 · 2020年7月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【IPAM workshops】加州大学洛杉矶分校会议：Geometry and Learning from Data in 3D and Beyond，workshop Ⅲ：Geometry of Big Data

【IPAM workshops】加州大学洛杉矶分校会议：Geometry and Learning from Data in 3D and Beyond，workshop Ⅲ：Geometry of Big Data

专知会员服务

8+阅读 · 2019年11月10日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】面向可扩展深度神经网络的预测编码：理论与实践

如何快速获取数百万架无人机？

EMNLP 2025 | RTQA：递归思想求解复杂的时间知识图谱问答

组合式零样本学习综述

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Github项目推荐 | pikepdf - Python的PDF读写库

Github项目推荐 | pikepdf - Python的PDF读写库

AI研习社

9+阅读 · 2019年3月29日

已删除

将门创投

7+阅读 · 2019年3月28日

大数据 | 顶级SCI期刊专刊/国际会议信息7条

大数据 | 顶级SCI期刊专刊/国际会议信息7条

Call4Papers

10+阅读 · 2018年12月29日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

lightgbm algorithm case of kaggle（上）

lightgbm algorithm case of kaggle（上）

R语言中文社区

8+阅读 · 2018年3月20日

【泡泡一分钟】基于均值偏移聚类方法的3D点云配准算法（3dv-49）

【泡泡一分钟】基于均值偏移聚类方法的3D点云配准算法（3dv-49）

泡泡机器人SLAM

6+阅读 · 2018年2月28日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Concentration study of M-estimators using the influence function

Arxiv

0+阅读 · 2022年1月14日

A posteriori error analysis for a space-time parallel discretization of parabolic partial differential equations

Arxiv

0+阅读 · 2022年1月14日

Data Fusion with Latent Map Gaussian Processes

Arxiv

0+阅读 · 2022年1月13日

Deep Recursive Embedding for High-Dimensional Data

Arxiv

0+阅读 · 2022年1月13日

Geometric Conditions for the Discrepant Posterior Phenomenon and Connections to Simpson's Paradox

Arxiv

0+阅读 · 2022年1月12日

Pixel Level Data Augmentation for Semantic Image Segmentation using Generative Adversarial Networks

Pixel Level Data Augmentation for Semantic Image Segmentation using Generative Adversarial Networks

Arxiv

5+阅读 · 2019年2月8日

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Arxiv

11+阅读 · 2018年12月6日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Learning Image Conditioned Label Space for Multilabel Classification

Arxiv

5+阅读 · 2018年2月21日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员