语言奖励模型的不确定性估算 (Uncertainty Estimation for Language Reward Models)

Language models can learn a range of capabilities from unsupervised training on text corpora. However, to solve a particular problem (such as text summarization) it is typically necessary to fine-tune them on a task-specific dataset. It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons. However, collecting a large preference comparison dataset is still expensive -- and the learned reward models are unreliable out-of-distribution. We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning (RL). Specifically, we use bootstrap aggregating (bagging) to train an ensemble of reward models differing in the initialization of their final layer. Ensembles have proved successful in prior applications of active learning, but we find that in our setting ensemble active learning does not outperform random sampling. Further experiments show that while the aggregate predictions are well-calibrated, the ensemble's estimated epistemic uncertainty is only weakly correlated with model error. We suspect this is because the ensemble members are fine-tuned from a single model and so are similar to one another. This suggests current pre-training methods will need to be modified to support uncertainty estimation, e.g. by training multiple language models.

翻译：语言模型可以从未经监督的文本公司培训中学习一系列能力。但是,为了解决一个具体问题,通常需要用具体任务数据集来微调它们。通常,对于人类来说,在选择选项之间作出选择比提供标签数据更容易,而以前的工作通过培训一种优惠比较的奖励模式而取得了最先进的业绩。然而,收集大量偏好比较数据集仍然费用昂贵,而所学的奖励模型并不可靠,因此分配不可靠。我们寻求通过不确定性估算来解决这些问题(如文本总和),通过积极学习和风险规避强化学习(RL)来提高抽样效率和稳健性。具体地说,我们使用套靴子集(buck)来训练在初始化最后层中不同的奖励模型组合。在先前应用积极学习的模型中,这些组合证明成功,但我们发现,在设置组合的积极学习不会超越随机抽样。进一步的实验表明,尽管综合预测是精确的,但组合组合的效益和稳健的组合(regleglemental)会提高效率和稳健性和稳健性,因为我们所估计的模型与单一的不确定性是比重的。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【深度学习中的不确定性-贝叶斯CNN | TensorFlow概率】Uncertainty In Deep Learning — Bayesian CNN | TensorFlow Probability

专知会员服务

40+阅读 · 2022年3月19日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日