The Extremal River Problem has emerged as a flagship problem for causal discovery in extreme values of a network. The task is to recover a river network from only extreme flow measured at a set $V$ of stations, without any information on the stations' locations. We present QTree, a new simple and efficient algorithm to solve the Extremal River Problem that performs very well compared to existing methods on hydrology data and in simulations. QTree returns a root-directed tree and achieves almost perfect recovery on the Upper Danube network data, the existing benchmark data set, as well as on new data from the Lower Colorado River network in Texas. It can handle missing data, has an automated parameter tuning procedure, and runs in time $O(n |V|^2)$, where $n$ is the number of observations and $|V|$ the number of nodes in the graph. Furthermore, we prove that the QTree estimator is consistent under a Bayesian network model for extreme values with noise. We also assess the small sample behaviour of QTree through simulations and detail the strengths and possible limitations of QTree.
翻译:Extremal河问题已成为在网络极端值中出现因果关系发现的首要问题。 任务是从在固定站点所在地没有任何信息的情况下, 在一个固定站点的V美元中只测量到的极端流量中恢复一个河流网络。 我们提供了QTree, 这是一种解决Extremal河问题的简单而高效的新算法, 与现有的水文学数据和模拟方法相比,它表现得非常好。 QTree 返回了根直树, 并在上多瑙河网络数据、 现有基准数据集以及得克萨斯州下科罗拉多河网络的新数据中实现了几乎完美的恢复。 它可以处理缺失的数据, 拥有自动参数调制程序, 并及时运行 $O (n ⁇ V2), 其中, 美元是观测次数, $+V $ ($) 。 此外, 我们证明 Qree 节点在Bayesian 网络模型下具有噪音的极端值模型下, Qree 的小型样本行为。 我们还通过模拟和详细评估Qre 的强度和可能的限制, 我们还评估了Qre 。