Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval. This recently has been achieved by embedding the graphical structure of the database into a manifold so that the hierarchy is preserved. Persistent homology provides a rigorous characterization for the database topology in terms of both its hierarchy and connectivity structure. We compute persistent homology on a variety of datasets and show that some commonly used embeddings fail to preserve the connectivity. Moreover, we show that embeddings which successfully retain the database topology coincide in persistent homology. We introduce the dilation-invariant bottleneck distance to capture this effect, which addresses metric distortion on manifolds. We use it to show that distances between topology-preserving embeddings of databases are small.
翻译:在数据库中适当代表元素以使查询能够准确匹配是信息检索的一项核心任务。 最近,通过将数据库的图形结构嵌入一个多元体,从而保持等级结构,实现了这一点。 持久性同质学从等级和连接结构两方面为数据库的地形提供了严格的特征描述。 我们在各种数据集中计算了持久性同质学,并表明一些常用嵌入未能保存连接。 此外,我们显示,成功保留数据库表层的嵌入层与持续同质学相吻合。 我们引入了边际-变量瓶颈距离来捕捉这一效果,它解决了多元体的参数扭曲。 我们用它来显示数据库的表层-保存嵌入层之间的距离很小。