跨部门重力森林 (Cross-Cluster Weighted Forests) - 专知论文

会员服务 ·

0

Weight · 簇 · 随机森林 · 随机森林算法 · 数据集 ·

2021 年 5 月 17 日

Cross-Cluster Weighted Forests

翻译：跨部门重力森林

Maya Ramchandran,Rajarshi Mukherjee,Giovanni Parmigiani

from arxiv, 20 pages, 6 figures, 1 table

Adapting machine learning algorithms to better handle the presence of natural clustering or batch effects within training datasets is imperative across a wide variety of biological applications. This article considers the effect of ensembling Random Forest learners trained on clusters within a single dataset with heterogeneity in the distribution of the features. We find that constructing ensembles of forests trained on clusters determined by algorithms such as k-means results in significant improvements in accuracy and generalizability over the traditional Random Forest algorithm. We denote our novel approach as the Cross-Cluster Weighted Forest, and examine its robustness to various data-generating scenarios and outcome models. Furthermore, we explore the influence of the data-partitioning and ensemble weighting strategies on conferring the benefits of our method over the existing paradigm. Finally, we apply our approach to cancer molecular profiling and gene expression datasets that are naturally divisible into clusters and illustrate that our approach outperforms classic Random Forest. Code and supplementary material are available at https://github.com/m-ramchandran/cross-cluster.

翻译：修改机器学习算法以更好地处理培训数据集中的自然集群或批量效应的存在,是各种生物应用中十分必要的。本条款考虑了将随机森林学习者聚集在一起,在单一数据集内进行集群培训,在地貌分布上各异。我们发现,根据k手段等算法确定的集群,建立经过培训的森林群群,可大大改进传统随机森林算法的准确性和可概括性。我们提到我们的新颖方法,如跨Cluster Weighted Forest, 并审查它对各种数据生成情景和结果模型的稳健性。此外,我们还探讨了数据分割和共同加权战略对赋予我们方法优于现有模式的惠益的影响。最后,我们运用了我们的方法,将自然可辨别到的癌症分子剖析和基因表达数据集应用于各组,说明我们的方法超越了经典随机森林。https://github.com/m-ramchandran/crosy-commission。和补充材料可在https://github.com/m-ramchranran-cround。

0

相关内容

Weight

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

多标签学习的新趋势（2020 Survey）

多标签学习的新趋势（2020 Survey）

专知会员服务

44+阅读 · 2020年12月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

技术|Resty-redis-cluster: 基于Openresty构建的rediscluster客户端

技术|Resty-redis-cluster: 基于Openresty构建的rediscluster客户端

ImportNew

6+阅读 · 2019年3月12日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

DER Forecast using Privacy Preserving Federated Learning

DER Forecast using Privacy Preserving Federated Learning

Arxiv

0+阅读 · 2021年7月7日

Distance covariance for random fields

Distance covariance for random fields

Arxiv

0+阅读 · 2021年7月7日

GCN-Based Linkage Prediction for Face Clustering on Imbalanced Datasets: An Empirical Study

GCN-Based Linkage Prediction for Face Clustering on Imbalanced Datasets: An Empirical Study

Arxiv

0+阅读 · 2021年7月7日

A Comparative Study of Using Spatial-Temporal Graph Convolutional Networks for Predicting Availability in Bike Sharing Schemes

Arxiv

0+阅读 · 2021年7月7日

Bayesian clustering using random effects models and predictive projections

Bayesian clustering using random effects models and predictive projections

Arxiv

0+阅读 · 2021年7月6日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

REST: Relational Event-driven Stock Trend Forecasting

Arxiv

15+阅读 · 2021年2月19日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Linkage Based Face Clustering via Graph Convolution Network

Arxiv

16+阅读 · 2019年3月27日

VIP会员

文章信息

相关主题

随机森林算法

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

多标签学习的新趋势（2020 Survey）

多标签学习的新趋势（2020 Survey）

专知会员服务

44+阅读 · 2020年12月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

技术|Resty-redis-cluster: 基于Openresty构建的rediscluster客户端

技术|Resty-redis-cluster: 基于Openresty构建的rediscluster客户端

ImportNew

6+阅读 · 2019年3月12日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

DER Forecast using Privacy Preserving Federated Learning

DER Forecast using Privacy Preserving Federated Learning

Arxiv

0+阅读 · 2021年7月7日

Distance covariance for random fields

Distance covariance for random fields

Arxiv

0+阅读 · 2021年7月7日

GCN-Based Linkage Prediction for Face Clustering on Imbalanced Datasets: An Empirical Study

GCN-Based Linkage Prediction for Face Clustering on Imbalanced Datasets: An Empirical Study

Arxiv

0+阅读 · 2021年7月7日

A Comparative Study of Using Spatial-Temporal Graph Convolutional Networks for Predicting Availability in Bike Sharing Schemes

Arxiv

0+阅读 · 2021年7月7日

Bayesian clustering using random effects models and predictive projections

Bayesian clustering using random effects models and predictive projections

Arxiv

0+阅读 · 2021年7月6日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

REST: Relational Event-driven Stock Trend Forecasting

Arxiv

15+阅读 · 2021年2月19日

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Arxiv

11+阅读 · 2021年2月18日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Linkage Based Face Clustering via Graph Convolution Network

Arxiv

16+阅读 · 2019年3月27日

微信扫码咨询专知VIP会员