Large language models show impressive results at predicting structured text such as code, but also commonly introduce errors and hallucinations in their output. When used to assist software developers, these models may make mistakes that users must go back and fix, or worse, introduce subtle bugs that users may miss entirely. We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U-SURE), an approach for building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditioned utility, using random samples from a generative model as a proxy for the unobserved possible intents of the end user. Our technique combines minimum-Bayes-risk decoding, dual decomposition, and decision diagrams in order to efficiently produce structured uncertainty summaries, given only sample access to an arbitrary generative model of code and an optional AST parser. We demonstrate R-U-SURE on three developer-assistance tasks, and show that it can be applied different user interaction patterns without retraining the model and leads to more accurate uncertainty estimates than token-probability baselines.
翻译:大型语言模型在预测代码等结构化文本方面显示出令人印象深刻的结果,但通常也会在其输出中引入错误和幻觉。当这些模型用于帮助软件开发者时,这些模型可能会犯错误,用户必须回去修复,或者更糟的是,这些错误可能引入用户可能完全忽略的微妙错误。我们提议了随机化的通用驱动的不确定性参数合成(R-U-SURE),这是一种基于目标性效用的决策理论模型来建立不确定性识别建议的方法,它使用一个基因化模型的随机样本作为最终用户未观察到的可能意图的替代物。我们的技术结合了最低限度的Bayes-风险解密、双重分解和决定图,以便高效地生成结构化的不确定性概要,只提供任意的代码组合模型和可选的 AST 剖析器的样本。我们用三种开发者援助任务展示了R-U-SURE, 并表明它可以应用不同的用户互动模式,而无需对模型进行再培训,并导致比象征性概率基线更准确的不确定性估计。</s>