带有延迟反馈的平平序列优化 (Smooth Sequential Optimisation with Delayed Feedback) - 专知论文

会员服务 ·

0

估计/估计量 · 平滑 · 可约的 · 真正例 · Bandits ·

2021 年 6 月 21 日

Smooth Sequential Optimisation with Delayed Feedback

翻译：带有延迟反馈的平平序列优化

Srivas Chennu,Jamie Martin,Puli Liyanagama,Phil Mohr

from arxiv, Workshop on Bayesian causal inference for real world interactive systems, 27th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021)

Stochastic delays in feedback lead to unstable sequential learning using multi-armed bandits. Recently, empirical Bayesian shrinkage has been shown to improve reward estimation in bandit learning. Here, we propose a novel adaptation to shrinkage that estimates smoothed reward estimates from windowed cumulative inputs, to deal with incomplete knowledge from delayed feedback and non-stationary rewards. Using numerical simulations, we show that this adaptation retains the benefits of shrinkage, and improves the stability of reward estimation by more than 50%. Our proposal reduces variability in treatment allocations to the best arm by up to 3.8x, and improves statistical accuracy - with up to 8% improvement in true positive rates and 37% reduction in false positive rates. Together, these advantages enable control of the trade-off between speed and stability of adaptation, and facilitate human-in-the-loop sequential optimisation.

翻译：通过多武装匪徒,反馈的拖延导致连续学习的不稳定。最近,经验型贝叶斯萎缩表明,在土匪学习中提高了奖励估计值。在这里,我们建议对从窗口累积投入中估算平滑的奖励估计数的缩减进行新颖的调整,处理来自延迟反馈和非静态奖励的不完整知识。我们利用数字模拟,表明这种调整保留了收缩的好处,提高了奖励估算的稳定性超过50 % 。我们的建议将最佳手臂的治疗分配的变异性降低到3.8x,并提高统计准确性 — — 真实正率提高高达8%,假正率降低37%。这些优势共同帮助控制了适应速度和稳定性之间的平衡,并促进了人与人之间的顺序优化。

0

相关内容

估计/估计量

估计/估计量

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【序列推荐系统:挑战、进展和展望】Sequential Recommender Systems

【序列推荐系统:挑战、进展和展望】Sequential Recommender Systems

专知会员服务

82+阅读 · 2020年4月25日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

专知会员服务

49+阅读 · 2020年1月1日

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

专知会员服务

32+阅读 · 2019年12月26日

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

专知会员服务

55+阅读 · 2019年11月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：近期15篇推荐系统论文

LibRec 精选：近期15篇推荐系统论文

LibRec智能推荐

5+阅读 · 2019年3月5日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Geometrical Postural Optimisation of 7-DoF Limb-Like Manipulators

Geometrical Postural Optimisation of 7-DoF Limb-Like Manipulators

Arxiv

0+阅读 · 2021年8月20日

Performance Bounds for Sampling and Remote Estimation of Gauss-Markov Processes over a Noisy Channel with Random Delay

Performance Bounds for Sampling and Remote Estimation of Gauss-Markov Processes over a Noisy Channel with Random Delay

Arxiv

0+阅读 · 2021年8月20日

Accelerating Federated Learning with a Global Biased Optimiser

Accelerating Federated Learning with a Global Biased Optimiser

Arxiv

0+阅读 · 2021年8月20日

On variance estimation for the one-sample log-rank test

On variance estimation for the one-sample log-rank test

Arxiv

0+阅读 · 2021年8月18日

Matching on Generalized Propensity Scores with Continuous Exposures

Arxiv

0+阅读 · 2021年8月18日

Real Negatives Matter: Continuous Training with Real Negatives for Delayed Feedback Modeling

Arxiv

8+阅读 · 2021年4月29日

Video Summarisation by Classification with Deep Reinforcement Learning

Video Summarisation by Classification with Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年7月9日

Image Retrieval with Mixed Initiative and Multimodal Feedback

Arxiv

5+阅读 · 2018年5月8日

Can Neural Machine Translation be Improved with User Feedback?

Arxiv

3+阅读 · 2018年4月16日

Human Interaction with Recommendation Systems

Arxiv

6+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【序列推荐系统:挑战、进展和展望】Sequential Recommender Systems

【序列推荐系统:挑战、进展和展望】Sequential Recommender Systems

专知会员服务

82+阅读 · 2020年4月25日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

专知会员服务

49+阅读 · 2020年1月1日

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

专知会员服务

32+阅读 · 2019年12月26日

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

专知会员服务

55+阅读 · 2019年11月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《战争形态演变：合成兵种防御主导模式探析》48页slides

人工智能驱动弹药制造现代化：美国陆军转型之路

《多域空战指挥体系：驾驭复杂性的艺术》

构建军事人工智能信任体系始于破除黑盒机制

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：近期15篇推荐系统论文

LibRec 精选：近期15篇推荐系统论文

LibRec智能推荐

5+阅读 · 2019年3月5日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Geometrical Postural Optimisation of 7-DoF Limb-Like Manipulators

Geometrical Postural Optimisation of 7-DoF Limb-Like Manipulators

Arxiv

0+阅读 · 2021年8月20日

Performance Bounds for Sampling and Remote Estimation of Gauss-Markov Processes over a Noisy Channel with Random Delay

Performance Bounds for Sampling and Remote Estimation of Gauss-Markov Processes over a Noisy Channel with Random Delay

Arxiv

0+阅读 · 2021年8月20日

Accelerating Federated Learning with a Global Biased Optimiser

Accelerating Federated Learning with a Global Biased Optimiser

Arxiv

0+阅读 · 2021年8月20日

On variance estimation for the one-sample log-rank test

On variance estimation for the one-sample log-rank test

Arxiv

0+阅读 · 2021年8月18日

Matching on Generalized Propensity Scores with Continuous Exposures

Arxiv

0+阅读 · 2021年8月18日

Real Negatives Matter: Continuous Training with Real Negatives for Delayed Feedback Modeling

Arxiv

8+阅读 · 2021年4月29日

Video Summarisation by Classification with Deep Reinforcement Learning

Video Summarisation by Classification with Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年7月9日

Image Retrieval with Mixed Initiative and Multimodal Feedback

Arxiv

5+阅读 · 2018年5月8日

Can Neural Machine Translation be Improved with User Feedback?

Arxiv

3+阅读 · 2018年4月16日

Human Interaction with Recommendation Systems

Arxiv

6+阅读 · 2018年3月28日

微信扫码咨询专知VIP会员