Graphs drawn in the plane are ubiquitous, arising from data sets through a variety of methods ranging from GIS analysis to image classification to shape analysis. A fundamental problem in this type of data is comparison: given a set of such graphs, can we rank how similar they are, in such a way that we capture their geometric "shape" in the plane? In this paper we explore a method to compare two such embedded graphs, via a simplified combinatorial representation called a tail-less merge tree which encodes the structure based on a fixed direction. First, we examine the properties of a distance designed to compare merge trees called the branching distance, and show that the distance as defined in previous work fails to satisfy some of the requirements of a metric. We incorporate this into a new distance function called average branching distance to compare graphs by looking at the branching distance for merge trees defined over many directions. Despite the theoretical issues, we show that the definition is still quite useful in practice by using our open-source code to cluster data sets of embedded graphs.
翻译:在平面中绘制的图形是无处不在的,来自从地理信息系统分析到图像分类到形状分析等多种方法的数据集。这类数据中的一个基本问题是比较:如果有一组这样的图表,我们能否对它们的相似性进行排序,这样我们就能在平面中捕捉到它们的几何“形状”?在本文中,我们探索一种方法,通过简化的组合图解来比较两个嵌入的图,即一个叫做无尾的合并树,该图谱根据固定方向对结构进行编码。首先,我们考察了为比较合并树而设计的一个距离的特性,即分支距离,显示先前工作定义的距离无法满足某度的某些要求。我们将此纳入一个新的距离函数中,即平均分支距离,通过查看分支距离来比较图表,从而将树从许多方向加以界定的合并。尽管存在理论问题,但我们表明,该定义在实践中仍然非常有用,通过使用我们的开源代码对嵌入的图形进行分组数据组合。