有条件的预测模型:风险和战略 (Conditioning Predictive Models: Risks and Strategies)

Our intention is to provide a definitive reference on what it would take to safely make use of generative/predictive models in the absence of a solution to the Eliciting Latent Knowledge problem. Furthermore, we believe that large language models can be understood as such predictive models of the world, and that such a conceptualization raises significant opportunities for their safe yet powerful use via carefully conditioning them to predict desirable outputs. Unfortunately, such approaches also raise a variety of potentially fatal safety problems, particularly surrounding situations where predictive models predict the output of other AI systems, potentially unbeknownst to us. There are numerous potential solutions to such problems, however, primarily via carefully conditioning models to predict the things we want (e.g. humans) rather than the things we don't (e.g. malign AIs). Furthermore, due to the simplicity of the prediction objective, we believe that predictive models present the easiest inner alignment problem that we are aware of. As a result, we think that conditioning approaches for predictive models represent the safest known way of eliciting human-level and slightly superhuman capabilities from large language models and other similar future models.

翻译：我们的意图是提供一个明确的参考,说明在无法解决Elibing Lentn知识问题的情况下,安全地使用基因化/预测性模型需要怎样才能安全地利用基因化/预测性模型。此外,我们认为,大型语言模型可以被理解为世界的预测性模型,而这种概念性模型通过仔细调整这些模型来为其安全而有力的使用带来重要的机会,从而可以预测理想产出。不幸的是,这种方法还提出了各种潜在的致命安全问题,特别是预测性模型预测其他AI系统产出的周围情况,可能不为我们所知。然而,这些问题有许多潜在的解决办法,主要是通过仔细调整模型来预测我们所想要的东西(例如人类),而不是我们不想要的东西(例如恶意的AIs)。此外,由于预测目标简单,我们认为,预测性模型是我们所了解的最容易的内在结合问题。因此,我们认为,预测性模型的调整方法代表了从大型语言模型和其他类似的未来模型中获取人的水平和略微超人能力的最安全的方法。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日