联邦随机赌博机学习中的未观测上下文处理 (Federated Stochastic Bandit Learning with Unobserved Context) - 专知论文

会员服务 ·

0

赌博机 · 上下文 · 智能体 · 最优 · 协作 ·

2023 年 3 月 29 日

Federated Stochastic Bandit Learning with Unobserved Context

翻译：联邦随机赌博机学习中的未观测上下文处理

Jiabin Lin,Shana Moothedath

We study the problem of federated stochastic multi-arm contextual bandits with unknown contexts, in which M agents are faced with different bandits and collaborate to learn. The communication model consists of a central server and the agents share their estimates with the central server periodically to learn to choose optimal actions in order to minimize the total regret. We assume that the exact contexts are not observable and the agents observe only a distribution of the contexts. Such a situation arises, for instance, when the context itself is a noisy measurement or based on a prediction mechanism. Our goal is to develop a distributed and federated algorithm that facilitates collaborative learning among the agents to select a sequence of optimal actions so as to maximize the cumulative reward. By performing a feature vector transformation, we propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions. Finally, we validated the performance of our algorithm and compared it with another baseline approach using numerical simulations on synthetic data and on the real-world movielens dataset.

翻译：我们研究了联邦随机多臂上下文赌博机问题，其中有M个智能体面对不同的赌博机，并协作学习。通信模型包括一个中央服务器，智能体定期与中央服务器共享其估计结果，以便选择最优动作并最小化总遗憾。我们假设确切的上下文不可观测，智能体仅观察上下文的分布。这种情况发生在上下文本身是噪声测量或基于预测机制的情况下。我们的目标是开发一个分布式的联邦算法，促进智能体之间的协作学习，选择最优动作序列，以最大化累积奖励。通过进行特征矢量转换，我们提出了一个基于消除的算法，并在线性参数化奖励函数的情况下证明了遗憾上限。最后，我们通过合成数据和实际数据集movielens的数值模拟验证了算法的性能，并将其与另一基线方法进行了比较。

0

相关内容

赌博机

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【KDD2021】基于因果反事实Shapley的MARL信度分配

专知会员服务

19+阅读 · 2021年7月11日

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

【SIGIR2020】策略感知的无偏排序学习—Top-K排序，Policy-Aware Unbiased Learning to Rank for Top-𝑘 Rankings

【SIGIR2020】策略感知的无偏排序学习—Top-K排序，Policy-Aware Unbiased Learning to Rank for Top-𝑘 Rankings

专知会员服务

27+阅读 · 2020年6月10日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

【泡泡一分钟】从三维流动中学习单目视觉里程计及三维稠密建图

【泡泡一分钟】从三维流动中学习单目视觉里程计及三维稠密建图

泡泡机器人SLAM

12+阅读 · 2019年2月12日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习初探 - 从多臂老虎机问题说起

强化学习初探 - 从多臂老虎机问题说起

专知

10+阅读 · 2018年4月3日

非线性Schrödinger方程孤立子和怪波的数值方法

国家自然科学基金

0+阅读 · 2015年12月31日

多任务学习的理论分析与应用

国家自然科学基金

6+阅读 · 2013年12月31日

无穷维动力系统的随机小扰动

国家自然科学基金

0+阅读 · 2012年12月31日

考虑新能源发电预测误差及其联合分布特性的电力系统随机优化理论研究

国家自然科学基金

1+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于信息表示与传导机制的异质agent计算金融模型

国家自然科学基金

0+阅读 · 2011年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

基于运动模式在线学习的移动机器人对运动目标的主动观测与最优跟踪

国家自然科学基金

0+阅读 · 2011年12月31日

钙敏感受体在缺氧诱导Aβ36807;量生成中的作用及其分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

港口物流系统复杂离散调度异常检测及调度失效控制策略

国家自然科学基金

0+阅读 · 2009年12月31日

Differentiable Model Selection for Ensemble Learning

Arxiv

0+阅读 · 2023年5月19日

Probably Approximately Correct Federated Learning

Arxiv

0+阅读 · 2023年5月19日

Goal-Oriented Communications in Federated Learning via Feedback on Risk-Averse Participation

Arxiv

0+阅读 · 2023年5月19日

Learning Diverse Risk Preferences in Population-based Self-play

Arxiv

0+阅读 · 2023年5月19日

Client Selection for Federated Policy Optimization with Environment Heterogeneity

Arxiv

0+阅读 · 2023年5月18日

Discounted Thompson Sampling for Non-Stationary Bandit Problems

Arxiv

0+阅读 · 2023年5月18日

Reinforcement Learning with History-Dependent Dynamic Contexts

Arxiv

0+阅读 · 2023年5月18日

Minimax rate for multivariate data under componentwise local differential privacy constraints

Arxiv

0+阅读 · 2023年5月17日

Learning Continuous Control Policies for Information-Theoretic Active Perception

Arxiv

0+阅读 · 2023年5月16日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【KDD2021】基于因果反事实Shapley的MARL信度分配

专知会员服务

19+阅读 · 2021年7月11日

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

【SIGIR2020】策略感知的无偏排序学习—Top-K排序，Policy-Aware Unbiased Learning to Rank for Top-𝑘 Rankings

【SIGIR2020】策略感知的无偏排序学习—Top-K排序，Policy-Aware Unbiased Learning to Rank for Top-𝑘 Rankings

专知会员服务

27+阅读 · 2020年6月10日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

【泡泡一分钟】从三维流动中学习单目视觉里程计及三维稠密建图

【泡泡一分钟】从三维流动中学习单目视觉里程计及三维稠密建图

泡泡机器人SLAM

12+阅读 · 2019年2月12日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习初探 - 从多臂老虎机问题说起

强化学习初探 - 从多臂老虎机问题说起

专知

10+阅读 · 2018年4月3日

相关论文

Differentiable Model Selection for Ensemble Learning

Arxiv

0+阅读 · 2023年5月19日

Probably Approximately Correct Federated Learning

Arxiv

0+阅读 · 2023年5月19日

Goal-Oriented Communications in Federated Learning via Feedback on Risk-Averse Participation

Arxiv

0+阅读 · 2023年5月19日

Learning Diverse Risk Preferences in Population-based Self-play

Arxiv

0+阅读 · 2023年5月19日

Client Selection for Federated Policy Optimization with Environment Heterogeneity

Arxiv

0+阅读 · 2023年5月18日

Discounted Thompson Sampling for Non-Stationary Bandit Problems

Arxiv

0+阅读 · 2023年5月18日

Reinforcement Learning with History-Dependent Dynamic Contexts

Arxiv

0+阅读 · 2023年5月18日

Minimax rate for multivariate data under componentwise local differential privacy constraints

Arxiv

0+阅读 · 2023年5月17日

Learning Continuous Control Policies for Information-Theoretic Active Perception

Arxiv

0+阅读 · 2023年5月16日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

相关基金

非线性Schrödinger方程孤立子和怪波的数值方法

国家自然科学基金

0+阅读 · 2015年12月31日

多任务学习的理论分析与应用

国家自然科学基金

6+阅读 · 2013年12月31日

无穷维动力系统的随机小扰动

国家自然科学基金

0+阅读 · 2012年12月31日

考虑新能源发电预测误差及其联合分布特性的电力系统随机优化理论研究

国家自然科学基金

1+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于信息表示与传导机制的异质agent计算金融模型

国家自然科学基金

0+阅读 · 2011年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

基于运动模式在线学习的移动机器人对运动目标的主动观测与最优跟踪

国家自然科学基金

0+阅读 · 2011年12月31日

钙敏感受体在缺氧诱导Aβ36807;量生成中的作用及其分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

港口物流系统复杂离散调度异常检测及调度失效控制策略

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员