While long short-term memory (LSTM) models have demonstrated stellar performance with streamflow predictions, there are major risks in applying these models in contiguous regions with no gauges, or predictions in ungauged regions (PUR) problems. However, softer data such as the flow duration curve (FDC) may be already available from nearby stations, or may become available. Here we demonstrate that sparse FDC data can be migrated and assimilated by an LSTM-based network, via an encoder. A stringent region-based holdout test showed a median Kling-Gupta efficiency (KGE) of 0.62 for a US dataset, substantially higher than previous state-of-the-art global-scale ungauged basin tests. The baseline model without FDC was already competitive (median KGE 0.56), but integrating FDCs had substantial value. Because of the inaccurate representation of inputs, the baseline models might sometimes produce catastrophic results. However, model generalizability was further meaningfully improved by compiling an ensemble based on models with different input selections.
翻译:虽然长期短期内存(LSTM)模型显示了流流预测的星状性,但在将这些模型应用于没有测量或预测的毗连区域存在重大风险;然而,从附近站点可能已经可以获得或可能获得流长曲线等较软的数据,如流长曲线(FDC),但没有流长曲线(FDC)的基线模型已经具有竞争力(Midentn KGE 0.56),但整合FDC的基线模型具有巨大的价值。由于投入的表述不准确,基准模型有时可能产生灾难性的结果。但是,基于区域的严格缓冲试验显示,对于美国数据集而言,Kling-Gupta中位效率为0.62,大大高于以往最先进的全球规模的混凝土测试。由于投入的表述不准确,基准模型有时可能产生灾难性的结果。然而,模型的普及性通过根据不同输入选择的模型汇编组合而得到进一步有意义的改进。