安全线定级强盗 (Safe Linear Leveling Bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 线性的 · Extensibility · Bandits · Performer ·

2021 年 12 月 13 日

Safe Linear Leveling Bandits

翻译：安全线定级强盗

Ilker Demirel,Mehmet Ufuk Ozdemir,Cem Tekin

from arxiv, 17 pages, 4 figures

Multi-armed bandits (MAB) are extensively studied in various settings where the objective is to \textit{maximize} the actions' outcomes (i.e., rewards) over time. Since safety is crucial in many real-world problems, safe versions of MAB algorithms have also garnered considerable interest. In this work, we tackle a different critical task through the lens of \textit{linear stochastic bandits}, where the aim is to keep the actions' outcomes close to a target level while respecting a \textit{two-sided} safety constraint, which we call \textit{leveling}. Such a task is prevalent in numerous domains. Many healthcare problems, for instance, require keeping a physiological variable in a range and preferably close to a target level. The radical change in our objective necessitates a new acquisition strategy, which is at the heart of a MAB algorithm. We propose SALE-LTS: Safe Leveling via Linear Thompson Sampling algorithm, with a novel acquisition strategy to accommodate our task and show that it achieves sublinear regret with the same time and dimension dependence as previous works on the classical reward maximization problem absent any safety constraint. We demonstrate and discuss our algorithm's empirical performance in detail via thorough experiments.

翻译：多武装匪徒(MAB) 在不同环境中广泛研究, 目标是要长期\ textit{ maximize} 行动结果( 即奖励) 。由于安全在许多现实世界问题中至关重要, 安全版的MAB算法也引起了相当大的兴趣。在这项工作中, 我们通过 klextit{ linear schochatic 土匪的镜头来处理不同的关键任务。我们提议 SALE- LTS: 通过Linear Thompson Sampling 算法将行动结果稳定到一个目标水平, 尊重我们称之为 Textit{ text- sloppling 的安全限制。这种任务在许多领域十分普遍。例如, 许多保健问题需要将生理变量保持在一个范围, 最好也接近目标水平。我们的目标需要通过新的获取战略, 也就是在 MAB 算法的核心。我们提议 SALE- LTS: 安全等级化, 通过Linear Thomps Sampling 算法, 来适应我们的任务, 并显示它没有以任何前一线段时间和层面的细细微的细微的实验来进行我们的工作。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

贝叶斯网络在医疗的应用综述：过去，现在和未来 | A Comprehensive Scoping Review of Bayesian Networks in Healthcare: Past, Present and Future

贝叶斯网络在医疗的应用综述：过去，现在和未来 | A Comprehensive Scoping Review of Bayesian Networks in Healthcare: Past, Present and Future

专知会员服务

41+阅读 · 2020年2月26日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ICML2019 tutorial】安全机器学习（Safe Machine Learning），Silvia Chiappa，Jan Leike

【ICML2019 tutorial】安全机器学习（Safe Machine Learning），Silvia Chiappa，Jan Leike

专知会员服务

23+阅读 · 2019年6月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Tracking Most Significant Arm Switches in Bandits

Tracking Most Significant Arm Switches in Bandits

Arxiv

0+阅读 · 2022年2月16日

Safe Active Dynamics Learning and Control: A Sequential Exploration-Exploitation Framework

Arxiv

0+阅读 · 2022年2月16日

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences

Arxiv

0+阅读 · 2022年2月14日

The Impact of Batch Learning in Stochastic Linear Bandits

Arxiv

0+阅读 · 2022年2月14日

Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality

Arxiv

0+阅读 · 2022年2月14日

Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits

Arxiv

0+阅读 · 2022年2月12日

Adaptive Bandit Convex Optimization with Heterogeneous Curvature

Arxiv

0+阅读 · 2022年2月12日

Online Bayesian Recommendation with No Regret

Arxiv

0+阅读 · 2022年2月12日

Shuffle Private Linear Contextual Bandits

Arxiv

0+阅读 · 2022年2月11日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

贝叶斯网络在医疗的应用综述：过去，现在和未来 | A Comprehensive Scoping Review of Bayesian Networks in Healthcare: Past, Present and Future

贝叶斯网络在医疗的应用综述：过去，现在和未来 | A Comprehensive Scoping Review of Bayesian Networks in Healthcare: Past, Present and Future

专知会员服务

41+阅读 · 2020年2月26日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ICML2019 tutorial】安全机器学习（Safe Machine Learning），Silvia Chiappa，Jan Leike

【ICML2019 tutorial】安全机器学习（Safe Machine Learning），Silvia Chiappa，Jan Leike

专知会员服务

23+阅读 · 2019年6月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Tracking Most Significant Arm Switches in Bandits

Tracking Most Significant Arm Switches in Bandits

Arxiv

0+阅读 · 2022年2月16日

Safe Active Dynamics Learning and Control: A Sequential Exploration-Exploitation Framework

Arxiv

0+阅读 · 2022年2月16日

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences

Arxiv

0+阅读 · 2022年2月14日

The Impact of Batch Learning in Stochastic Linear Bandits

Arxiv

0+阅读 · 2022年2月14日

Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality

Arxiv

0+阅读 · 2022年2月14日

Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits

Arxiv

0+阅读 · 2022年2月12日

Adaptive Bandit Convex Optimization with Heterogeneous Curvature

Arxiv

0+阅读 · 2022年2月12日

Online Bayesian Recommendation with No Regret

Arxiv

0+阅读 · 2022年2月12日

Shuffle Private Linear Contextual Bandits

Arxiv

0+阅读 · 2022年2月11日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

微信扫码咨询专知VIP会员