以Lyapunov为基础的 " 强盗反馈最佳控制优化 " 方法 (A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback) - 专知论文

会员服务 ·

0

赌博机/老虎机 · 优化器 · 约束优化 · 约束 · 总体代价 ·

2022 年 1 月 23 日

A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback

翻译：以Lyapunov为基础的 " 强盗反馈最佳控制优化 " 方法

Semih Cayci,Yilin Zheng,Atilla Eryilmaz

In a wide variety of applications including online advertising, contractual hiring, and wireless scheduling, the controller is constrained by a stringent budget constraint on the available resources, which are consumed in a random amount by each action, and a stochastic feasibility constraint that may impose important operational limitations on decision-making. In this work, we consider a general model to address such problems, where each action returns a random reward, cost, and penalty from an unknown joint distribution, and the decision-maker aims to maximize the total reward under a budget constraint $B$ on the total cost and a stochastic constraint on the time-average penalty. We propose a novel low-complexity algorithm based on Lyapunov optimization methodology, named ${\tt LyOn}$, and prove that for $K$ arms it achieves $O(\sqrt{K B\log B})$ regret and zero constraint-violation when $B$ is sufficiently large. The low computational cost and sharp performance bounds of ${\tt LyOn}$ suggest that Lyapunov-based algorithm design methodology can be effective in solving constrained bandit optimization problems.

翻译：在各种应用中,包括网上广告、合同雇用和无线日程安排,控制员受到以下因素的限制:对现有资源的严格预算限制,每项行动都随机消耗大量资源,以及可能给决策造成重大业务限制的随机可行性限制。在这项工作中,我们考虑一种解决这类问题的一般模式,即每次行动都带来随机报酬、费用和由未知的联合分配所施加的惩罚,决策者的目标是在预算限制下,在总成本的B美元和对时间平均罚款的随机限制下,最大限度地获得全部报酬。我们提议一种基于Lyapunov优化方法的新的低复杂性算法,名为$&t LyOn},并证明如果$B足够大,对$(sqrt{K B\log B})产生遗憾和零限制。低计算成本和急性性性能约束为$@tt LyOn}表明,基于Lyapunov的算法设计方法能够有效地解决限制的波段优化问题。

0

相关内容

赌博机/老虎机

赌博机/老虎机

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于神经网络和群体智能的稀疏表示算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于博弈论的电力广域控制系统信息安全建模和分析方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

控制系统的约束矩阵方程及其高效数值算法

国家自然科学基金

0+阅读 · 2013年12月31日

进化规划算法的计算时间难题研究

国家自然科学基金

0+阅读 · 2010年12月31日

Petri网可重写理论及在服务组合中的应用

国家自然科学基金

0+阅读 · 2009年12月31日

多用户MIMO-OFDMA系统中基于混合反馈的资源管理研究

国家自然科学基金

0+阅读 · 2009年12月31日

扰动下一般复杂网络鲁棒自适应牵制控制、同步及H∞#20248;化

国家自然科学基金

0+阅读 · 2009年12月31日

超过程及相关SPDE的研究

国家自然科学基金

0+阅读 · 2008年12月31日

启发式算法设计中的骨架分析与应用

国家自然科学基金

0+阅读 · 2008年12月31日

图的多项式研究及应用

国家自然科学基金

1+阅读 · 2008年12月31日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

Arxiv

0+阅读 · 2022年4月20日

Tight Last-Iterate Convergence of the Extragradient Method for Constrained Monotone Variational Inequalities

Arxiv

0+阅读 · 2022年4月20日

RIS-Assisted Cooperative NOMA with SWIPT

RIS-Assisted Cooperative NOMA with SWIPT

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Rank-Constrained Least-Squares: Prediction and Inference

Arxiv

0+阅读 · 2022年4月17日

Quantized Federated Learning under Transmission Delay and Outage Constraints

Arxiv

0+阅读 · 2022年4月17日

Towards a Stronger Theory for Permutation-based Evolutionary Algorithms

Arxiv

0+阅读 · 2022年4月15日

Finding Hall blockers by matrix scaling

Arxiv

0+阅读 · 2022年4月15日

Feature Compression for Rate Constrained Object Detection on the Edge

Arxiv

0+阅读 · 2022年4月15日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】多目标奖励与偏好优化：理论与算法

《无形的防御者？将定向能武器集成到反无人机框架的机遇与挑战》报告

自主化海军：海上无人系统与未来海战

迈向智能体系统规模化的科学

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

Arxiv

0+阅读 · 2022年4月20日

Tight Last-Iterate Convergence of the Extragradient Method for Constrained Monotone Variational Inequalities

Arxiv

0+阅读 · 2022年4月20日

RIS-Assisted Cooperative NOMA with SWIPT

RIS-Assisted Cooperative NOMA with SWIPT

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Rank-Constrained Least-Squares: Prediction and Inference

Arxiv

0+阅读 · 2022年4月17日

Quantized Federated Learning under Transmission Delay and Outage Constraints

Arxiv

0+阅读 · 2022年4月17日

Towards a Stronger Theory for Permutation-based Evolutionary Algorithms

Arxiv

0+阅读 · 2022年4月15日

Finding Hall blockers by matrix scaling

Arxiv

0+阅读 · 2022年4月15日

Feature Compression for Rate Constrained Object Detection on the Edge

Arxiv

0+阅读 · 2022年4月15日

相关基金

基于神经网络和群体智能的稀疏表示算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于博弈论的电力广域控制系统信息安全建模和分析方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

控制系统的约束矩阵方程及其高效数值算法

国家自然科学基金

0+阅读 · 2013年12月31日

进化规划算法的计算时间难题研究

国家自然科学基金

0+阅读 · 2010年12月31日

Petri网可重写理论及在服务组合中的应用

国家自然科学基金

0+阅读 · 2009年12月31日

多用户MIMO-OFDMA系统中基于混合反馈的资源管理研究

国家自然科学基金

0+阅读 · 2009年12月31日

扰动下一般复杂网络鲁棒自适应牵制控制、同步及H∞#20248;化

国家自然科学基金

0+阅读 · 2009年12月31日

超过程及相关SPDE的研究

国家自然科学基金

0+阅读 · 2008年12月31日

启发式算法设计中的骨架分析与应用

国家自然科学基金

0+阅读 · 2008年12月31日

图的多项式研究及应用

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员