Deep learning (DL) techniques have achieved great success in predictive accuracy in a variety of tasks, but deep neural networks (DNNs) are shown to produce highly overconfident scores for even abnormal samples. Well-defined uncertainty indicates whether a model's output should (or should not) be trusted and thus becomes critical in real-world scenarios which typically involves shifted input distributions due to many factors. Existing uncertainty approaches assume that testing samples from a different data distribution would induce unreliable model predictions thus have higher uncertainty scores. They quantify model uncertainty by calibrating DL model's confidence of a given input and evaluate the effectiveness in computer vision (CV) and natural language processing (NLP)-related tasks. However, their methodologies' reliability may be compromised under programming tasks due to difference in data representations and shift patterns. In this paper, we first define three different types of distribution shift in program data and build a large-scale shifted Java dataset. We implement two common programming language tasks on our dataset to study the effect of each distribution shift on DL model performance. We also propose a large-scale benchmark of existing state-of-the-art predictive uncertainty on programming tasks and investigate their effectiveness under data distribution shift. Experiments show that program distribution shift does degrade the DL model performance to varying degrees and that existing uncertainty methods all present certain limitations in quantifying uncertainty on program dataset.
翻译:深度学习(DL)技术在预测各种任务的准确性方面取得了巨大成功,但深神经网络(DNNS)显示,即使异常样本也会产生高度自信的分数。定义明确的不确定性表明,模型的输出是否应当(或不应该)值得信任,从而在现实世界情景中变得至关重要,而现实情景通常涉及因多种因素而转移输入分布。现有的不确定性方法假设,不同数据分布模式的测试样本将产生不可靠的模型预测,从而产生更高的不确定性分数。它们通过校准DL模型对特定投入的信心来量化模型的不确定性,并评估计算机愿景和自然语言处理(NLP)相关任务的有效性。然而,由于数据表达和变化模式模式模式任务的不同,其方法的可靠性可能会在编程任务下受到损害。在本文件中,我们首先界定了方案数据中三种不同的分布类型,并建立一个大规模变化的爪哇数据集。我们在我们的数据集上执行两个共同的编程语言任务,以研究每次分配模式变化对DL模型性表现的影响。我们还提议了一个大规模基准,即现有状态-动态变化预测数据变化方案在目前数据变变的不确定性分配方面,从而调查现有数据变变变变的进度,以显示现有数据格式分配方式显示现有数据变化的可靠性任务。