持续行动,在有持续行动的背景下进行非政策评价的地方计量方法学习 (Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions)

We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic policies in contextual bandits with continuous action spaces. Our work is motivated by practical scenarios where the target policy needs to be deterministic due to domain requirements, such as prescription of treatment dosage and duration in medicine. Although importance sampling (IS) provides a basic principle for OPE, it is ill-posed for the deterministic target policy with continuous actions. Our main idea is to relax the target policy and pose the problem as kernel-based estimation, where we learn the kernel metric in order to minimize the overall mean squared error (MSE). We present an analytic solution for the optimal metric, based on the analysis of bias and variance. Whereas prior work has been limited to scalar action spaces or kernel bandwidth selection, our work takes a step further being capable of vector action spaces and metric optimization. We show that our estimator is consistent, and significantly reduces the MSE compared to baseline OPE methods through experiments on various domains.

翻译：我们考虑在具有连续行动空间的背景强盗中,对确定性政策进行非政策评价的地方核心指标学习(OPE),我们的工作受到实际设想的驱动,即目标政策由于领域要求而需要确定性,如治疗剂量和医学持续时间的处方,尽管重要的抽样(IS)为OPE提供了基本原则,但对于确定性目标政策来说却缺乏持续行动。我们的主要想法是放松目标政策,并造成以内核为基础的估计问题,我们学习内核指标,以尽量减少总体平均平方错误(MSE)。我们根据对偏差和差异的分析,提出了最佳指标的解析性解决办法。虽然先前的工作限于卡拉行动空间或内核带宽选择,但我们的工作进一步迈出了一步,能够利用矢量行动空间和指标优化。我们表明,我们的估测器是一致的,并且通过在各个领域的实验,大大降低了与基本OPE方法相比的MSE。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日