方案数据分配变动方案下预测的不确定性估计 (Estimating Predictive Uncertainty Under Program Data Distribution Shift)

Deep learning (DL) techniques have achieved great success in predictive accuracy in a variety of tasks, but deep neural networks (DNNs) are shown to produce highly overconfident scores for even abnormal samples. Well-defined uncertainty indicates whether a model's output should (or should not) be trusted and thus becomes critical in real-world scenarios which typically involves shifted input distributions due to many factors. Existing uncertainty approaches assume that testing samples from a different data distribution would induce unreliable model predictions thus have higher uncertainty scores. They quantify model uncertainty by calibrating DL model's confidence of a given input and evaluate the effectiveness in computer vision (CV) and natural language processing (NLP)-related tasks. However, their methodologies' reliability may be compromised under programming tasks due to difference in data representations and shift patterns. In this paper, we first define three different types of distribution shift in program data and build a large-scale shifted Java dataset. We implement two common programming language tasks on our dataset to study the effect of each distribution shift on DL model performance. We also propose a large-scale benchmark of existing state-of-the-art predictive uncertainty on programming tasks and investigate their effectiveness under data distribution shift. Experiments show that program distribution shift does degrade the DL model performance to varying degrees and that existing uncertainty methods all present certain limitations in quantifying uncertainty on program dataset.

翻译：深度学习(DL)技术在预测各种任务的准确性方面取得了巨大成功,但深神经网络(DNNS)显示,即使异常样本也会产生高度自信的分数。定义明确的不确定性表明,模型的输出是否应当(或不应该)值得信任,从而在现实世界情景中变得至关重要,而现实情景通常涉及因多种因素而转移输入分布。现有的不确定性方法假设,不同数据分布模式的测试样本将产生不可靠的模型预测,从而产生更高的不确定性分数。它们通过校准DL模型对特定投入的信心来量化模型的不确定性,并评估计算机愿景和自然语言处理(NLP)相关任务的有效性。然而,由于数据表达和变化模式模式模式任务的不同,其方法的可靠性可能会在编程任务下受到损害。在本文件中,我们首先界定了方案数据中三种不同的分布类型,并建立一个大规模变化的爪哇数据集。我们在我们的数据集上执行两个共同的编程语言任务,以研究每次分配模式变化对DL模型性表现的影响。我们还提议了一个大规模基准,即现有状态-动态变化预测数据变化方案在目前数据变变的不确定性分配方面,从而调查现有数据变变变变的进度,以显示现有数据格式分配方式显示现有数据变化的可靠性任务。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/