风险感知的线性赌博机和凸损失函数 (Risk-aware linear bandits with convex loss) - 专知论文

会员服务 ·

0

风险感知 · 赌博机 · 度量 · 损失函数 · 损失 ·

2023 年 3 月 27 日

Risk-aware linear bandits with convex loss

翻译：风险感知的线性赌博机和凸损失函数

Patrick Saux,Odalric-Ambrym Maillard

In decision-making problems such as the multi-armed bandit, an agent learns sequentially by optimizing a certain feedback. While the mean reward criterion has been extensively studied, other measures that reflect an aversion to adverse outcomes, such as mean-variance or conditional value-at-risk (CVaR), can be of interest for critical applications (healthcare, agriculture). Algorithms have been proposed for such risk-aware measures under bandit feedback without contextual information. In this work, we study contextual bandits where such risk measures can be elicited as linear functions of the contexts through the minimization of a convex loss. A typical example that fits within this framework is the expectile measure, which is obtained as the solution of an asymmetric least-square problem. Using the method of mixtures for supermartingales, we derive confidence sequences for the estimation of such risk measures. We then propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits. This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent, at the cost of slightly higher regret. We conclude by evaluating the resulting algorithms on numerical experiments.

翻译：在诸如多臂赌博机等决策问题中，代理通过优化某种反馈来进行顺序学习。虽然平均奖励标准得到了广泛研究，但其他反映对不良结果的厌恶的度量，例如平均-方差或条件风险价值(CVaR)，对于关键应用（如医疗保健、农业）可以是有益的。已提出算法，用于在没有背景信息的情况下进行此类风险感知度量的赌博反馈。在这项工作中，我们研究了上下文赌徒，其中此类风险度量可以通过减小凸损失函数的上下文的线性函数来引出。一个典型的例子是通过解决一个不对称最小二乘问题获得的expectile度量。使用超过鞍点的混合方法，我们为估计此类风险测量提供置信序列。然后，我们提出了一个乐观的UCB算法，用于学习最佳的风险感知行动，类似于广义线性赌徒的遗憾保证。该方法要求在算法的每一轮中解决一个凸问题，我们可以通过仅允许通过在线梯度下降获得的近似解来放宽这个问题，代价是略高的遗憾。最后，我们通过数值实验评估了结果算法的性能。

0

相关内容

风险感知

【ICDM2022教程】多目标优化与推荐，173页ppt

【ICDM2022教程】多目标优化与推荐，173页ppt

专知会员服务

46+阅读 · 2022年12月24日

【硬核书】稀疏多项式优化:理论与实践，220页pdf

【硬核书】稀疏多项式优化:理论与实践，220页pdf

专知会员服务

71+阅读 · 2022年9月30日

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

机器学习损失函数概述，Loss Functions in Machine Learning

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

83+阅读 · 2022年3月19日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

信息不完全的双边匹配决策方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

协方差融合算法在时滞系统中的应用研究

国家自然科学基金

2+阅读 · 2015年12月31日

随机递归最优控制及其在金融中的应用研究

国家自然科学基金

0+阅读 · 2014年12月31日

关于多目标函数的稀疏优化模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

挠性卫星姿态动力学系统非线性控制新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于时间加权H2指标的Markov跳变系统的模型降阶问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

模拟仿真的输入不确定性及其在金融风险管理中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

标量场模型非线性行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

贝叶斯框架下风险度量的非参数估计及其应用研究

国家自然科学基金

1+阅读 · 2012年12月31日

时滞随机系统的最优控制理论及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

Functional Adaptive Double-Sparsity Estimator for Functional Linear Regression Model with Multiple Functional Covariates

Arxiv

0+阅读 · 2023年5月17日

EFx Budget-Feasible Allocations with High Nash Welfare

Arxiv

0+阅读 · 2023年5月16日

A deterministic Kaczmarz algorithm for solving linear systems

Arxiv

0+阅读 · 2023年5月16日

Projection-Free Online Convex Optimization with Stochastic Constraints

Arxiv

0+阅读 · 2023年5月16日

Caching Contents with Varying Popularity using Restless Bandits

Arxiv

0+阅读 · 2023年5月16日

Arbitrary Decisions are a Hidden Cost of Differentially Private Training

Arxiv

0+阅读 · 2023年5月15日

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

Arxiv

0+阅读 · 2023年5月15日

Randomized Algorithm for the Maximum-Profit Routing Problem

Arxiv

0+阅读 · 2023年5月13日

Deep Deterministic Policy Gradient for End-to-End Communication Systems without Prior Channel Knowledge

Arxiv

0+阅读 · 2023年5月12日

Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月11日

VIP会员

文章信息

相关主题

相关VIP内容

【ICDM2022教程】多目标优化与推荐，173页ppt

【ICDM2022教程】多目标优化与推荐，173页ppt

专知会员服务

46+阅读 · 2022年12月24日

【硬核书】稀疏多项式优化:理论与实践，220页pdf

【硬核书】稀疏多项式优化:理论与实践，220页pdf

专知会员服务

71+阅读 · 2022年9月30日

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

机器学习损失函数概述，Loss Functions in Machine Learning

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

83+阅读 · 2022年3月19日

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

相关论文

Functional Adaptive Double-Sparsity Estimator for Functional Linear Regression Model with Multiple Functional Covariates

Arxiv

0+阅读 · 2023年5月17日

EFx Budget-Feasible Allocations with High Nash Welfare

Arxiv

0+阅读 · 2023年5月16日

A deterministic Kaczmarz algorithm for solving linear systems

Arxiv

0+阅读 · 2023年5月16日

Projection-Free Online Convex Optimization with Stochastic Constraints

Arxiv

0+阅读 · 2023年5月16日

Caching Contents with Varying Popularity using Restless Bandits

Arxiv

0+阅读 · 2023年5月16日

Arbitrary Decisions are a Hidden Cost of Differentially Private Training

Arxiv

0+阅读 · 2023年5月15日

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

Arxiv

0+阅读 · 2023年5月15日

Randomized Algorithm for the Maximum-Profit Routing Problem

Arxiv

0+阅读 · 2023年5月13日

Deep Deterministic Policy Gradient for End-to-End Communication Systems without Prior Channel Knowledge

Arxiv

0+阅读 · 2023年5月12日

Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月11日

相关基金

信息不完全的双边匹配决策方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

协方差融合算法在时滞系统中的应用研究

国家自然科学基金

2+阅读 · 2015年12月31日

随机递归最优控制及其在金融中的应用研究

国家自然科学基金

0+阅读 · 2014年12月31日

关于多目标函数的稀疏优化模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

挠性卫星姿态动力学系统非线性控制新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于时间加权H2指标的Markov跳变系统的模型降阶问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

模拟仿真的输入不确定性及其在金融风险管理中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

标量场模型非线性行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

贝叶斯框架下风险度量的非参数估计及其应用研究

国家自然科学基金

1+阅读 · 2012年12月31日

时滞随机系统的最优控制理论及其应用

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员