在贝叶斯统计中,超参数是先验分布的参数; 该术语用于将它们与所分析的基础系统的模型参数区分开。

VIP内容

当演示专家的潜在奖励功能在任何时候都不能被观察到时,我们解决了在连续控制的背景下模仿学习算法的超参数(HPs)调优的问题。关于模仿学习的大量文献大多认为这种奖励功能适用于HP选择,但这并不是一个现实的设置。事实上,如果有这种奖励功能,就可以直接用于策略训练,而不需要模仿。为了解决这个几乎被忽略的问题,我们提出了一些外部奖励的可能代理。我们对其进行了广泛的实证研究(跨越9个环境的超过10000个代理商),并对选择HP提出了实用的建议。我们的结果表明,虽然模仿学习算法对HP选择很敏感,但通常可以通过奖励功能的代理来选择足够好的HP。

https://www.zhuanzhi.ai/paper/beffdb76305bfa324433d64e6975ec76

成为VIP会员查看完整内容
0
6

最新论文

We propose a fast algorithm for the probabilistic solution of boundary value problems (BVPs), which are ordinary differential equations subject to boundary conditions. In contrast to previous work, we introduce a Gauss--Markov prior and tailor it specifically to BVPs, which allows computing a posterior distribution over the solution in linear time, at a quality and cost comparable to that of well-established, non-probabilistic methods. Our model further delivers uncertainty quantification, mesh refinement, and hyperparameter adaptation. We demonstrate how these practical considerations positively impact the efficiency of the scheme. Altogether, this results in a practically usable probabilistic BVP solver that is (in contrast to non-probabilistic algorithms) natively compatible with other parts of the statistical modelling tool-chain.

0
0
下载
预览
Top