R-U-SURE? 通过最大限度地利用随机用户意图的用途, 不确定性- 软件代码建议</s> (R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents)

Large language models show impressive results at predicting structured text such as code, but also commonly introduce errors and hallucinations in their output. When used to assist software developers, these models may make mistakes that users must go back and fix, or worse, introduce subtle bugs that users may miss entirely. We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U-SURE), an approach for building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditioned utility, using random samples from a generative model as a proxy for the unobserved possible intents of the end user. Our technique combines minimum-Bayes-risk decoding, dual decomposition, and decision diagrams in order to efficiently produce structured uncertainty summaries, given only sample access to an arbitrary generative model of code and an optional AST parser. We demonstrate R-U-SURE on three developer-assistance tasks, and show that it can be applied different user interaction patterns without retraining the model and leads to more accurate uncertainty estimates than token-probability baselines.

翻译：大型语言模型在预测代码等结构化文本方面显示出令人印象深刻的结果,但通常也会在其输出中引入错误和幻觉。当这些模型用于帮助软件开发者时,这些模型可能会犯错误,用户必须回去修复,或者更糟的是,这些错误可能引入用户可能完全忽略的微妙错误。我们提议了随机化的通用驱动的不确定性参数合成(R-U-SURE),这是一种基于目标性效用的决策理论模型来建立不确定性识别建议的方法,它使用一个基因化模型的随机样本作为最终用户未观察到的可能意图的替代物。我们的技术结合了最低限度的Bayes-风险解密、双重分解和决定图,以便高效地生成结构化的不确定性概要,只提供任意的代码组合模型和可选的 AST 剖析器的样本。我们用三种开发者援助任务展示了R-U-SURE, 并表明它可以应用不同的用户互动模式,而无需对模型进行再培训,并导致比象征性概率基线更准确的不确定性估计。</s>

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/