While deep learning models have seen recent high uptake in the geosciences, and are appealing in their ability to learn from minimally processed input data, as black box models they do not provide an easy means to understand how a decision is reached, which in safety-critical tasks especially can be problematical. An alternative route is to use simpler, more transparent white box models, in which task-specific feature construction replaces the more opaque feature discovery process performed automatically within deep learning models. Using data from the Groningen Gas Field in the Netherlands, we build on an existing logistic regression model by the addition of four further features discovered using elastic net driven data mining within the catch22 time series analysis package. We then evaluate the performance of the augmented logistic regression model relative to a deep (CNN) model, pre-trained on the Groningen data, on progressively increasing noise-to-signal ratios. We discover that, for each ratio, our logistic regression model correctly detects every earthquake, while the deep model fails to detect nearly 20 % of seismic events, thus justifying at least a degree of caution in the application of deep models, especially to data with higher noise-to-signal ratios.
翻译:虽然深层学习模型看到地球科学中最近大量采用,并且正在呼吁它们能够从最低处理的输入数据中学习,因为黑盒模型并不能提供一种容易理解如何作出决定的简单手段,在安全关键任务中尤其如此。另一种途径是使用更简单、更透明的白箱模型,其中任务特性的构造取代了在深层学习模型中自动进行的更不透明的特征发现过程。我们利用荷兰格罗宁根天然气场的数据,在现有的物流回归模型的基础上,又增加了在捕获22时间序列分析包中利用弹性网驱动数据挖掘所发现的另外四个特征。然后,我们评估了扩大的物流回归模型相对于深度(CNN)模型的绩效,该模型是事先经过培训的,目的是逐步增加噪音对信号比率。我们发现,对于每一种比例,我们的物流回归模型都正确探测了每一次地震,而深模型未能探测到近20%的地震事件,因此至少证明在应用深层模型时谨慎程度,特别是高噪音对高信号比率的数据。