The inference of minimum spanning arborescences within a set of objects is a general problem which translates into numerous application-specific unsupervised learning tasks. We introduce a unified and generic structure called edit arborescence that relies on edit paths between data in a collection, as well as the Min Edit Arborescence Problem, which asks for an edit arborescence that minimizes the sum of costs of its inner edit paths. Through the use of suitable cost functions, this generic framework allows to model a variety of problems. In particular, we show that by introducing encoding size preserving edit costs, it can be used as an efficient method for compressing collections of labeled graphs. Experiments on various graph datasets, with comparisons to standard compression tools, show the potential of our method.
翻译:一组天体内最小孔径的推论是一个一般性问题,它转化成许多与具体应用有关的、不受监督的学习任务。我们引入了一个统一和通用的结构,称为编辑孔径,它依赖于一个收藏中的数据之间的编辑路径,以及Min Edit Arborescente问题,它要求编辑孔径,以尽量减少其内部编辑路径的成本。通过使用适当的成本功能,这个通用框架可以模拟各种各样的问题。特别是,我们表明,通过引入编码大小保存编辑成本,它可以被用作压缩标签图表集的有效方法。各种图表数据集的实验,与标准压缩工具的比较,显示了我们方法的潜力。