Understanding the response of an output variable to multi-dimensional inputs lies at the heart of many data exploration endeavours. Topology-based methods, in particular Morse theory and persistent homology, provide a useful framework for studying this relationship, as phenomena of interest often appear naturally as fundamental features. The Morse-Smale complex captures a wide range of features by partitioning the domain of a scalar function into piecewise monotonic regions, while persistent homology provides a means to study these features at different scales of simplification. Previous works demonstrated how to compute such a representation and its usefulness to gain insight into multi-dimensional data. However, exploration of the multi-scale nature of the data was limited to selecting a single simplification threshold from a plot of region count. In this paper, we present a novel tree visualization that provides a concise overview of the entire hierarchy of topological features. The structure of the tree provides initial insights in terms of the distribution, size, and stability of all partitions. We use regression analysis to fit linear models in each partition, and develop local and relative measures to further assess uniqueness and the importance of each partition, especially with respect parents/children in the feature hierarchy. The expressiveness of the tree visualization becomes apparent when we encode such measures using colors, and the layout allows an unprecedented level of control over feature selection during exploration. For instance, selecting features from multiple scales of the hierarchy enables a more nuanced exploration. Finally, we demonstrate our approach using examples from several scientific domains.
翻译:了解产出变量对多维投入的反应是许多数据探索努力的核心所在。基于地形的方法,特别是摩斯理论和持久性同质学,为研究这一关系提供了一个有用的框架,因为感兴趣的现象往往自然地成为基本特征。莫尔斯-马利综合体通过将一个伸缩函数的域分割成片状单调区域来捕捉范围广泛的特征,而持久性同质学则提供了在不同简化程度上研究这些特征的手段。以往的工作表明如何计算这种代表性及其对于深入了解多维数据的益处。然而,数据多尺度性质的探索仅限于从一个区域计数图中选择一个单一的简化阈值。在本文件中,我们展示了一种新的树直观化,以简洁的方式概述了整个地形特征的等级结构,从分布、大小和稳定性的角度提供了初步的洞察。我们使用回归分析方法来适应每个分区的线性模型,并发展地方和相对的措施,以进一步评估每个间隔的独特性和重要性,特别是从一个区域计数个区域图中选择一个单一的简化门槛值。我们从几个层次上选择一个直观的层次,从而展示了我们所选的层次结构的清晰度。