Large language models show impressive results at predicting structured text such as code, but also commonly introduce errors and hallucinations in their output. When used to assist software developers, these models may make mistakes that users must go back and fix, or worse, introduce subtle bugs that users may miss entirely. We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U-SURE), an approach for building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditioned utility, using random samples from a generative model as a proxy for the unobserved possible intents of the end user. Our technique combines minimum-Bayes-risk decoding, dual decomposition, and decision diagrams in order to efficiently produce structured uncertainty summaries, given only sample access to an arbitrary generative model of code and an optional AST parser. We demonstrate R-U-SURE on three developer-assistance tasks, and show that it can be applied different user interaction patterns without retraining the model and leads to more accurate uncertainty estimates than token-probability baselines. We also release our implementation as an open-source library at https://github.com/google-research/r_u_sure.
翻译:R-U-SURE? 通过最大化随机用户意图下的效用实现不确定代码建议
翻译后的摘要:
大型语言模型在预测结构化文本(例如代码)方面显示出令人印象深刻的结果,但也常常在其输出中引入错误和幻觉。当用于协助软件开发人员时,这些模型可能会犯错,用户必须回去修复,或者更糟的是,引入用户可能完全忽略的微妙错误。我们提出一种名为随机化效用驱动的不确定区域综合(R-U-SURE)的方法,该方法基于目标条件效用的决策理论模型构建不确定性感知建议,使用从生成模型的随机样本作为端用户未观察到的可能意图的代理。我们的技术结合了最小贝叶斯风险解码、对偶分解和决策图,以便在仅具有任意代码生成模型和可选AST解析器的样本访问时高效产生结构化的不确定性摘要。我们在三个开发人员协助任务中展示了R-U-SURE,并表明它可以应用不同的用户交互模式而无需重新训练模型,并且比基于令牌概率的基线导致更准确的不确定性估计。我们还在https://github.com/google-research/r_u_sure上发布了我们的实现作为开源库。