跨部门重力森林 (Cross-Cluster Weighted Forests) - 专知论文

会员服务 ·

0

Weight · 簇 · 随机森林 · 随机森林算法 · Better ·

2021 年 10 月 15 日

Cross-Cluster Weighted Forests

翻译：跨部门重力森林

Maya Ramchandran,Rajarshi Mukherjee,Giovanni Parmigiani

from arxiv, 19 pages, 6 figures, 1 table

Adapting machine learning algorithms to better handle clustering or batch effects within training data sets is important across a wide variety of biological applications. This article considers the effect of ensembling Random Forest learners trained on clusters within a single data set with heterogeneity in the distribution of the features. We find that constructing ensembles of forests trained on clusters determined by algorithms such as k-means results in significant improvements in accuracy and generalizability over the traditional Random Forest algorithm. We denote our novel approach as the Cross-Cluster Weighted Forest, and examine its robustness to various data-generating scenarios and outcome models. Furthermore, we explore the influence of the data-partitioning and ensemble weighting strategies the benefits of our method over the existing paradigm. Finally, we apply our approach to cancer molecular profiling and gene expression data sets that are naturally divisible into clusters and illustrate that our approach outperforms classic Random Forest. Code and supplementary material are available at https://github.com/m-ramchandran/cross-cluster.

翻译：在培训数据集内调整机器学习算法以更好地处理集群或批量效应,对于各种各样的生物应用非常重要。本条款考虑了将随机森林学员在集群上受训的集合纳入单一数据集的影响,在特征分布上各异。我们发现,根据k- means等算法确定的集群建立经过培训的森林群群,可以大大改进传统随机森林算法的准确性和普遍性。我们用跨集群森林的标志来表示我们的新颖方法,并审查它是否牢固地适应了各种数据生成的设想和结果模型。此外,我们探索了数据分割和组合加权战略的影响,我们的方法对现有模式的好处。最后,我们运用了我们的方法,将自然可变异到集群的癌症分子剖析和基因表达数据集,说明我们的方法超越了典型随机森林。我们的方法和补充材料见http://github.com/m-chandran/croscrosty- groupram。

0

相关内容

Weight

【经典书】凸优化：算法与复杂度，130页pdf

【经典书】凸优化：算法与复杂度，130页pdf

专知会员服务

81+阅读 · 2021年11月16日

处理器芯片敏捷设计方法：问题与挑战

专知会员服务

19+阅读 · 2021年6月29日

【经典书】应用离散结构，568页pdf

专知会员服务

84+阅读 · 2021年5月4日

【经典书】机器学习导论，234页pdf

【经典书】机器学习导论，234页pdf

专知会员服务

77+阅读 · 2021年4月20日

【干货书】数据结构: 算法与信息检索Java，187页pdf

专知会员服务

28+阅读 · 2021年3月7日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

专知会员服务

89+阅读 · 2020年2月28日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

235+阅读 · 2020年1月21日

【元学习 | 论文】元学习聚类，Meta-Learning to Cluster，哥伦比亚大学

【元学习 | 论文】元学习聚类，Meta-Learning to Cluster，哥伦比亚大学

专知会员服务

42+阅读 · 2019年11月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

lightgbm algorithm case of kaggle（上）

lightgbm algorithm case of kaggle（上）

R语言中文社区

8+阅读 · 2018年3月20日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Adaptive Methods for Aggregated Domain Generalization

Arxiv

0+阅读 · 2021年12月9日

Implicit Neural Representations for Image Compression

Arxiv

0+阅读 · 2021年12月8日

Urysohn Forest for Aleatoric Uncertainty Quantification

Arxiv

0+阅读 · 2021年12月6日

Unsupervised Domain Generalization by Learning a Bridge Across Domains

Arxiv

1+阅读 · 2021年12月4日

Deep Learning with Multiple Data Set: A Weighted Goal Programming Approach

Arxiv

0+阅读 · 2021年11月27日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Deep Metric Transfer for Label Propagation with Limited Annotated Data

Arxiv

3+阅读 · 2018年12月20日

Thermodynamics and Feature Extraction by Machine Learning

Arxiv

3+阅读 · 2018年10月18日

Integrating semi-supervised label propagation and random forests for multi-atlas based hippocampus segmentation

Arxiv

3+阅读 · 2017年12月31日

VIP会员

文章信息

相关主题

随机森林算法

相关VIP内容

【经典书】凸优化：算法与复杂度，130页pdf

【经典书】凸优化：算法与复杂度，130页pdf

专知会员服务

81+阅读 · 2021年11月16日

处理器芯片敏捷设计方法：问题与挑战

专知会员服务

19+阅读 · 2021年6月29日

【经典书】应用离散结构，568页pdf

专知会员服务

84+阅读 · 2021年5月4日

【经典书】机器学习导论，234页pdf

【经典书】机器学习导论，234页pdf

专知会员服务

77+阅读 · 2021年4月20日

【干货书】数据结构: 算法与信息检索Java，187页pdf

专知会员服务

28+阅读 · 2021年3月7日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

专知会员服务

89+阅读 · 2020年2月28日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

235+阅读 · 2020年1月21日

【元学习 | 论文】元学习聚类，Meta-Learning to Cluster，哥伦比亚大学

【元学习 | 论文】元学习聚类，Meta-Learning to Cluster，哥伦比亚大学

专知会员服务

42+阅读 · 2019年11月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】基于物理的模拟

流匹配在生物学与生命科学中的应用综述

高质量数据集实践指南（1.0）

ICML 2025 关于语言模型机械可解释性的教程

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

lightgbm algorithm case of kaggle（上）

lightgbm algorithm case of kaggle（上）

R语言中文社区

8+阅读 · 2018年3月20日

算法｜随机森林（Random Forest）

算法｜随机森林（Random Forest）

全球人工智能

3+阅读 · 2018年1月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Adaptive Methods for Aggregated Domain Generalization

Arxiv

0+阅读 · 2021年12月9日

Implicit Neural Representations for Image Compression

Arxiv

0+阅读 · 2021年12月8日

Urysohn Forest for Aleatoric Uncertainty Quantification

Arxiv

0+阅读 · 2021年12月6日

Unsupervised Domain Generalization by Learning a Bridge Across Domains

Arxiv

1+阅读 · 2021年12月4日

Deep Learning with Multiple Data Set: A Weighted Goal Programming Approach

Arxiv

0+阅读 · 2021年11月27日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Deep Metric Transfer for Label Propagation with Limited Annotated Data

Arxiv

3+阅读 · 2018年12月20日

Thermodynamics and Feature Extraction by Machine Learning

Arxiv

3+阅读 · 2018年10月18日

Integrating semi-supervised label propagation and random forests for multi-atlas based hippocampus segmentation

Arxiv

3+阅读 · 2017年12月31日

微信扫码咨询专知VIP会员