通过小型连接改进数据流袋式组合组合的性能 (Improving the performance of bagging ensembles for data streams through mini-batching) - 专知论文

会员服务 ·

0

Performer · 流 · Continuity · Bagging · 集成 ·

2021 年 12 月 18 日

Improving the performance of bagging ensembles for data streams through mini-batching

翻译：通过小型连接改进数据流袋式组合组合的性能

Guilherme Cassales,Heitor Gomes,Albert Bifet,Bernhard Pfahringer,Hermes Senger

Often, machine learning applications have to cope with dynamic environments where data are collected in the form of continuous data streams with potentially infinite length and transient behavior. Compared to traditional (batch) data mining, stream processing algorithms have additional requirements regarding computational resources and adaptability to data evolution. They must process instances incrementally because the data's continuous flow prohibits storing data for multiple passes. Ensemble learning achieved remarkable predictive performance in this scenario. Implemented as a set of (several) individual classifiers, ensembles are naturally amendable for task parallelism. However, the incremental learning and dynamic data structures used to capture the concept drift increase the cache misses and hinder the benefit of parallelism. This paper proposes a mini-batching strategy that can improve memory access locality and performance of several ensemble algorithms for stream mining in multi-core environments. With the aid of a formal framework, we demonstrate that mini-batching can significantly decrease the reuse distance (and the number of cache misses). Experiments on six different state-of-the-art ensemble algorithms applying four benchmark datasets with varied characteristics show speedups of up to 5X on 8-core processors. These benefits come at the expense of a small reduction in predictive performance.

翻译：通常,机器学习应用程序必须应对动态环境,即数据以连续数据流的形式收集,且具有潜在的无限长度和短暂行为。与传统的(批量)数据挖掘相比,流式处理算法在计算资源和数据演变适应性方面有额外要求。它们必须渐进地处理各种情况,因为数据的连续流禁止存储数据以多种传输。结合学习在这个假设中取得了显著的预测性能。作为一组(数个)单个分类器执行的集合自然可以修正,以任务平行的方式进行。然而,用于捕捉概念漂移的渐进式学习和动态数据结构增加了缓存失位,并阻碍了平行主义的好处。本文建议了一种微型连接战略,可以改进多核心环境中流开采的记忆存存访问地点和几个共通算算法的性能。在正式框架的帮助下,我们证明微型连接可以大大减少再利用距离(和缓存误差次数) 。实验了六种不同状态的混合算法,应用四种基准数据集,从而增加了缓冲误差,从而阻碍着平行主义的好处。本文提议了一个微型缓冲战略,可以在8-C进程上显示成本递减缩。

0

相关内容

Performer

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Enabling Reproducibility and Meta-learning Through a Lifelong Database of Experiments (LDE)

Arxiv

0+阅读 · 2022年2月22日

PFGE: Parsimonious Fast Geometric Ensembling of DNNs

Arxiv

0+阅读 · 2022年2月22日

Improving Systematic Generalization Through Modularity and Augmentation

Arxiv

0+阅读 · 2022年2月22日

Neural Ensemble Search for Uncertainty Estimation and Dataset Shift

Arxiv

0+阅读 · 2022年2月21日

The Role of Heterogeneity in Autonomous Perimeter Defense Problems

The Role of Heterogeneity in Autonomous Perimeter Defense Problems

Arxiv

13+阅读 · 2022年2月21日

A Meta-embedding-based Ensemble Approach for ICD Coding Prediction

Arxiv

0+阅读 · 2022年2月21日

Efficient Continual Learning Ensembles in Neural Network Subspaces

Arxiv

0+阅读 · 2022年2月20日

Data Augmentation Approaches in Natural Language Processing: A Survey

Arxiv

18+阅读 · 2021年10月5日

Optimal Counterfactual Explanations in Tree Ensembles

Arxiv

5+阅读 · 2021年6月25日

Exploring Self-Supervised Representation Ensembles for COVID-19 Cough Classification

Arxiv

6+阅读 · 2021年5月17日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Enabling Reproducibility and Meta-learning Through a Lifelong Database of Experiments (LDE)

Arxiv

0+阅读 · 2022年2月22日

PFGE: Parsimonious Fast Geometric Ensembling of DNNs

Arxiv

0+阅读 · 2022年2月22日

Improving Systematic Generalization Through Modularity and Augmentation

Arxiv

0+阅读 · 2022年2月22日

Neural Ensemble Search for Uncertainty Estimation and Dataset Shift

Arxiv

0+阅读 · 2022年2月21日

The Role of Heterogeneity in Autonomous Perimeter Defense Problems

The Role of Heterogeneity in Autonomous Perimeter Defense Problems

Arxiv

13+阅读 · 2022年2月21日

A Meta-embedding-based Ensemble Approach for ICD Coding Prediction

Arxiv

0+阅读 · 2022年2月21日

Efficient Continual Learning Ensembles in Neural Network Subspaces

Arxiv

0+阅读 · 2022年2月20日

Data Augmentation Approaches in Natural Language Processing: A Survey

Arxiv

18+阅读 · 2021年10月5日

Optimal Counterfactual Explanations in Tree Ensembles

Arxiv

5+阅读 · 2021年6月25日

Exploring Self-Supervised Representation Ensembles for COVID-19 Cough Classification

Arxiv

6+阅读 · 2021年5月17日

微信扫码咨询专知VIP会员