As the costs of sensors and associated IT infrastructure decreases - as exemplified by the Internet of Things - increasing volumes of observational data are becoming available for use by environmental scientists. However, as the number of available observation sites increases, so too does the opportunity for data quality issues to emerge, particularly given that many of these sensors do not have the benefit of official maintenance teams. To realise the value of crowd sourced 'Internet of Things' type observations for environmental modelling, we require approaches that can automate the detection of outliers during the data modelling process so that they do not contaminate the true distribution of the phenomena of interest. To this end, here we present a Bayesian deep learning approach for spatio-temporal modelling of environmental variables with automatic outlier detection. Our approach implements a Gaussian-uniform mixture density network whose dual purposes - modelling the phenomenon of interest, and learning to classify and ignore outliers - are achieved simultaneously, each by specifically designed branches of our neural network. For our example application, we use the Met Office's Weather Observation Website data, an archive of observations from around 1900 privately run and unofficial weather stations across the British Isles. Using data on surface air temperature, we demonstrate how our deep mixture model approach enables the modelling of a highly skilled spatio-temporal temperature distribution without contamination from spurious observations. We hope that adoption of our approach will help unlock the potential of incorporating a wider range of observation sources, including from crowd sourcing, into future environmental models.
翻译:随着传感器和相关信息技术基础设施成本的降低(如物联网所证明的),越来越多的观测数据正日益成为环境科学家使用的观测数据数量。然而,随着现有观测地点数量的增加,数据质量问题出现的机会也随之增加,特别是鉴于许多传感器没有官方维护小组的惠益。为了实现“物联网”类型观测的人群源对环境建模的价值,我们需要在数据建模过程中自动检测外部数据,以便它们不会损害感兴趣的现象的真正分布。为此,我们提出了一种巴耶斯人深度观测方法,用于环境变量的表面建模,并自动探测外表。我们的方法采用高斯-统一混合密度网络的双重目的,即模拟利益现象,并学习对外部数据进行分类和忽略。我们各专门设计的神经模型的分支,我们使用气象办公室的天气观测网站方法,将大约1900年的私人运行和非官方气候变量建模的观测数据归档到英国各个岛屿的深度污染温度,包括高水平的大气温度的分布。我们用这些数据在英国各岛屿上展示了一种高水平的空气温度。