多武装遗传土匪:通过模拟实现离散优化的强化学习方法 (Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 离散化 · 优化器 · Bandits · Learning ·

2023 年 2 月 15 日

Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation

翻译：多武装遗传土匪:通过模拟实现离散优化的强化学习方法

Deniz Preil,Michael Krapp

This paper proposes a new algorithm, referred to as GMAB, that combines concepts from the reinforcement learning domain of multi-armed bandits and random search strategies from the domain of genetic algorithms to solve discrete stochastic optimization problems via simulation. In particular, the focus is on noisy large-scale problems, which often involve a multitude of dimensions as well as multiple local optima. Our aim is to combine the property of multi-armed bandits to cope with volatile simulation observations with the ability of genetic algorithms to handle high-dimensional solution spaces accompanied by an enormous number of feasible solutions. For this purpose, a multi-armed bandit framework serves as a foundation, where each observed simulation is incorporated into the memory of GMAB. Based on this memory, genetic operators guide the search, as they provide powerful tools for exploration as well as exploitation. The empirical results demonstrate that GMAB achieves superior performance compared to benchmark algorithms from the literature in a large variety of test problems. In all experiments, GMAB required considerably fewer simulations to achieve similar or (far) better solutions than those generated by existing methods. At the same time, GMAB's overhead with regard to the required runtime is extremely small due to the suggested tree-based implementation of its memory. Furthermore, we prove its convergence to the set of global optima as the simulation effort goes to infinity.

翻译：本文提出一种新的算法,称为“GMAB”,将多臂强盗强化学习领域的概念与基因算法领域随机搜索战略的概念结合起来,通过模拟解决离散的随机优化问题。特别是,重点是吵闹的大规模问题,这些问题往往涉及多个层面和多重局部选择。我们的目标是将多臂强盗的特性与基因算法处理高维解决方案空间的能力结合起来,同时辅之以大量可行的解决办法。为此,多臂强盗框架是一个基础,每个观察到的模拟都被纳入GMAB的记忆中。基于这一记忆,基因操作者指导搜索,因为它们为勘探和开发提供了强有力的工具。经验结果显示,与大量测试问题的文献基准算法相比,GMAB取得优异性的工作表现。在所有实验中,GMAB所需要的模拟比现有方法产生的要少得多或(远)更好的解决办法。与此同时,GAB的顶部与所需的全球模范努力相比,我们所建议的全球模范努力的趋近度是小的。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

microRNA在缺血再灌注致急性肾损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

非线性标量化及其在向量优化问题中的应用

国家自然科学基金

3+阅读 · 2013年12月31日

离子液体二元溶液的临界现象

国家自然科学基金

0+阅读 · 2013年12月31日

变分框架下的一类非局部的椭圆问题

国家自然科学基金

0+阅读 · 2013年12月31日

改性铁基催化剂低温SCR脱硝性能优化机理

国家自然科学基金

0+阅读 · 2012年12月31日

保险投资组合随机模型中风险控制及优化分红问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

含临界指标的非线性椭圆问题的临界维现象

国家自然科学基金

0+阅读 · 2012年12月31日

深海放线菌Streptomyces sp. SCSIO 03032抗肿瘤天然产物Spiroindimicins生物合成研究

国家自然科学基金

0+阅读 · 2012年12月31日

液相法制备钒酸铋光催化剂及其光催化活性增强机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

电控旋翼自适应控制方法研究

国家自然科学基金

1+阅读 · 2008年12月31日

CRISP: Curriculum inducing Primitive Informed Subgoal Prediction for Hierarchical Reinforcement Learning

Arxiv

0+阅读 · 2023年4月7日

Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

Arxiv

0+阅读 · 2023年4月6日

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年4月5日

Policy Expansion for Bridging Offline-to-Online Reinforcement Learning

Arxiv

0+阅读 · 2023年4月4日

Empirical Design in Reinforcement Learning

Arxiv

0+阅读 · 2023年4月3日

PyFlyt -- UAV Simulation Environments for Reinforcement Learning Research

Arxiv

0+阅读 · 2023年4月3日

An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning

Arxiv

0+阅读 · 2023年4月3日

Federated Ensemble Model-based Reinforcement Learning in Edge Computing

Arxiv

0+阅读 · 2023年4月1日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

Arxiv

10+阅读 · 2022年2月10日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

CRISP: Curriculum inducing Primitive Informed Subgoal Prediction for Hierarchical Reinforcement Learning

Arxiv

0+阅读 · 2023年4月7日

Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

Arxiv

0+阅读 · 2023年4月6日

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年4月5日

Policy Expansion for Bridging Offline-to-Online Reinforcement Learning

Arxiv

0+阅读 · 2023年4月4日

Empirical Design in Reinforcement Learning

Arxiv

0+阅读 · 2023年4月3日

PyFlyt -- UAV Simulation Environments for Reinforcement Learning Research

Arxiv

0+阅读 · 2023年4月3日

An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning

Arxiv

0+阅读 · 2023年4月3日

Federated Ensemble Model-based Reinforcement Learning in Edge Computing

Arxiv

0+阅读 · 2023年4月1日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

Arxiv

10+阅读 · 2022年2月10日

相关基金

microRNA在缺血再灌注致急性肾损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

非线性标量化及其在向量优化问题中的应用

国家自然科学基金

3+阅读 · 2013年12月31日

离子液体二元溶液的临界现象

国家自然科学基金

0+阅读 · 2013年12月31日

变分框架下的一类非局部的椭圆问题

国家自然科学基金

0+阅读 · 2013年12月31日

改性铁基催化剂低温SCR脱硝性能优化机理

国家自然科学基金

0+阅读 · 2012年12月31日

保险投资组合随机模型中风险控制及优化分红问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

含临界指标的非线性椭圆问题的临界维现象

国家自然科学基金

0+阅读 · 2012年12月31日

深海放线菌Streptomyces sp. SCSIO 03032抗肿瘤天然产物Spiroindimicins生物合成研究

国家自然科学基金

0+阅读 · 2012年12月31日

液相法制备钒酸铋光催化剂及其光催化活性增强机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

电控旋翼自适应控制方法研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员