Shifts: 跨多个大型任务的实际分布性移动数据集 (Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks)

Andrey Malinin,Neil Band, Ganshin, Alexander,German Chesnokov,Yarin Gal,Mark J. F. Gales,Alexey Noskov,Andrey Ploskonosov,Liudmila Prokhorenkova,Ivan Provilkov,Vatsal Raina,Vyas Raina, Roginskiy, Denis,Mariya Shmatova,Panos Tigas,Boris Yangel

There has been significant research done on developing methods for improving robustness to distributional shift and uncertainty estimation. In contrast, only limited work has examined developing standard datasets and benchmarks for assessing these approaches. Additionally, most work on uncertainty estimation and robustness has developed new techniques based on small-scale regression or image classification tasks. However, many tasks of practical interest have different modalities, such as tabular data, audio, text, or sensor data, which offer significant challenges involving regression and discrete or continuous structured prediction. Thus, given the current state of the field, a standardized large-scale dataset of tasks across a range of modalities affected by distributional shifts is necessary. This will enable researchers to meaningfully evaluate the plethora of recently developed uncertainty quantification methods, as well as assessment criteria and state-of-the-art baselines. In this work, we propose the \emph{Shifts Dataset} for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has been collected from industrial sources and services, is composed of three tasks, with each corresponding to a particular data modality: tabular weather prediction, machine translation, and self-driving car (SDC) vehicle motion prediction. All of these data modalities and tasks are affected by real, `in-the-wild' distributional shifts and pose interesting challenges with respect to uncertainty estimation. In this work we provide a description of the dataset and baseline results for all tasks.

翻译：对改进分配变化和不确定性估计的稳健性的方法进行了大量研究,相比之下,只有有限的工作审查了为评估这些方法而制定标准数据集和基准;此外,大多数关于不确定性估计和稳健性的工作都根据小规模回归或图像分类任务开发了新技术;然而,许多实际感兴趣的任务有不同的方式,如表格数据、音频、文本或传感器数据,这些方式涉及回归和离散或连续结构化预测等重大挑战;因此,鉴于实地的现状,有必要对受分配变化影响的各种模式的任务建立标准化的大规模数据集;这将使研究人员能够有意义地评估最近制定的不确定性量化方法以及评估标准和最新基线的过多情况;在这项工作中,我们提议采用计算不确定性估计数和分配变化的稳健性等不同方式;从工业来源和服务处收集的数据集由三项任务组成,每个任务都与特定数据模式相对应:表格天气预测、机器翻译、自驾驶式驱动式驱动数据、评估标准和最先进的不确定性评估标准和最先进的基线基线基准基准基准;通过这些数据,我们提供所有数据、最有意义的车辆分配方式的动态数据预测。