利用以人口为主的强盗优化在线超光度计 (Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits) - 专知论文

会员服务 ·

0

PBT · 超参数 · 赌博机/老虎机 · Performer · 优化器 ·

2021 年 2 月 22 日

Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits

翻译：利用以人口为主的强盗优化在线超光度计

Jack Parker-Holder,Vu Nguyen,Stephen Roberts

from arxiv, Camera-ready version, NeurIPS 2020

Many of the recent triumphs in machine learning are dependent on well-tuned hyperparameters. This is particularly prominent in reinforcement learning (RL) where a small change in the configuration can lead to failure. Despite the importance of tuning hyperparameters, it remains expensive and is often done in a naive and laborious way. A recent solution to this problem is Population Based Training (PBT) which updates both weights and hyperparameters in a single training run of a population of agents. PBT has been shown to be particularly effective in RL, leading to widespread use in the field. However, PBT lacks theoretical guarantees since it relies on random heuristics to explore the hyperparameter space. This inefficiency means it typically requires vast computational resources, which is prohibitive for many small and medium sized labs. In this work, we introduce the first provably efficient PBT-style algorithm, Population-Based Bandits (PB2). PB2 uses a probabilistic model to guide the search in an efficient way, making it possible to discover high performing hyperparameter configurations with far fewer agents than typically required by PBT. We show in a series of RL experiments that PB2 is able to achieve high performance with a modest computational budget.

翻译：近来在机器学习方面的许多成功都依赖于经过良好调整的超参数。这在强化学习(RL)中特别突出,因为对配置进行小的改变可能导致失败。尽管调整超参数很重要,但费用仍然昂贵,而且往往以幼稚和艰苦的方式完成。这个问题的最近解决办法是人口培训(PBT),它更新了一个代理人员单一培训的重量和超参数。PBT已证明在RL中特别有效,导致外地的广泛使用。然而,PBT缺乏理论保障,因为它依靠随机的超参数空间探索。这种效率低通常意味着它需要大量的计算资源,而许多中小型实验室则无法使用这种资源。在这项工作中,我们采用了第一个效率很高的PBT型算法(PPB2)。 PB2使用一种概率模型来有效指导搜索,从而能够发现高性超参数配置,而比通常高的PB2级测试要低得多。我们用一个普通的PB2级测试来显示一个普通的PLT。

0

相关内容

PBT

【南京大学】量子计算 (Spring 2021)课程

专知会员服务

59+阅读 · 2021年4月12日

DARPA可解释人工智能

DARPA可解释人工智能

专知会员服务

130+阅读 · 2020年12月22日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

专知会员服务

20+阅读 · 2019年12月24日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

4+阅读 · 2018年6月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Rule-Based Reinforcement Learning for Efficient Robot Navigation with Space Reduction

Arxiv

0+阅读 · 2021年4月15日

Safe Continuous Control with Constrained Model-Based Policy Optimization

Arxiv

0+阅读 · 2021年4月14日

ABEM: An Adaptive Agent-based Evolutionary Approach for Mining Influencers in Online Social Networks

Arxiv

0+阅读 · 2021年4月14日

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

Arxiv

0+阅读 · 2021年4月13日

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Arxiv

0+阅读 · 2021年4月13日

The Sample Complexity of Up-to-$\varepsilon$ Multi-Dimensional Revenue Maximization

Arxiv

0+阅读 · 2021年4月9日

The Menu-Size Complexity of Revenue Approximation

Arxiv

0+阅读 · 2021年4月9日

Population network structure impacts genetic algorithm optimisation performance

Arxiv

0+阅读 · 2021年4月9日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【南京大学】量子计算 (Spring 2021)课程

专知会员服务

59+阅读 · 2021年4月12日

DARPA可解释人工智能

DARPA可解释人工智能

专知会员服务

130+阅读 · 2020年12月22日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

专知会员服务

20+阅读 · 2019年12月24日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

通信行业：智能低空通感网络白皮书

3D形状生成：综述

6000字《伊朗-以色列战争解析：欺骗与信息战如何塑造公众认知》最新报告（附原文）

【博士论文】优化智能体工作流以提升信息获取效率

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

4+阅读 · 2018年6月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Rule-Based Reinforcement Learning for Efficient Robot Navigation with Space Reduction

Arxiv

0+阅读 · 2021年4月15日

Safe Continuous Control with Constrained Model-Based Policy Optimization

Arxiv

0+阅读 · 2021年4月14日

ABEM: An Adaptive Agent-based Evolutionary Approach for Mining Influencers in Online Social Networks

Arxiv

0+阅读 · 2021年4月14日

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

Arxiv

0+阅读 · 2021年4月13日

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Arxiv

0+阅读 · 2021年4月13日

The Sample Complexity of Up-to-$\varepsilon$ Multi-Dimensional Revenue Maximization

Arxiv

0+阅读 · 2021年4月9日

The Menu-Size Complexity of Revenue Approximation

Arxiv

0+阅读 · 2021年4月9日

Population network structure impacts genetic algorithm optimisation performance

Arxiv

0+阅读 · 2021年4月9日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

微信扫码咨询专知VIP会员