Cycle representatives of persistent homology classes can be used to provide descriptions of topological features in data. However, the non-uniqueness of these representatives creates ambiguity and can lead to many different interpretations of the same set of classes. One approach to solving this problem is to optimize the choice of representative against some measure that is meaningful in the context of the data. In this work, we provide a study of the effectiveness and computational cost of several $\ell_1$-minimization optimization procedures for constructing homological cycle bases for persistent homology with rational coefficients in dimension one, including uniform-weighted and length-weighted edge-loss algorithms as well as uniform-weighted and area-weighted triangle-loss algorithms. We conduct these optimizations via standard linear programming methods, applying general-purpose solvers to optimize over column bases of simplicial boundary matrices. Our key findings are: (i) optimization is effective in reducing the size of cycle representatives, (ii) the computational cost of optimizing a basis of cycle representatives exceeds the cost of computing such a basis in most data sets we consider, (iii) the choice of linear solvers matters a lot to the computation time of optimizing cycles, (iv) the computation time of solving an integer program is not significantly longer than the computation time of solving a linear program for most of the cycle representatives, using the Gurobi linear solver, (v) strikingly, whether requiring integer solutions or not, we almost always obtain a solution with the same cost and almost all solutions found have entries in {-1, 0, 1} and therefore, are also solutions to a restricted $\ell_0$ optimization problem, and (vi) we obtain qualitatively different results for generators in Erd\H{o}s-R\'enyi random clique complexes.
翻译:持续同质类的周期代表可以用来描述数据中的表层特征。 但是,这些代表的不独特性造成了模糊性,并可能导致对同一类的多种不同的解释。 解决这一问题的一种方法是,根据数据范围内有意义的某种计量,优化代表的选择。 在这项工作中,我们研究几个美元\ell\_1美元-最小化优化程序的有效性和计算成本,用于构建具有一级合理系数的持久性同质循环基础。 但是,这些代表的非独特性造成了模糊性,并可能导致对同一组的同一类别进行许多不同的解释。 但是,这些代表的非独特性代表的不独特性,包括统一加权和长时间加权边缘损失算法以及统一加权和地区加权三角损失算法。 我们通过标准的线性编程方法进行这些优化,运用通用解答器优化简单边界矩阵的柱基。 我们的主要结论是:(一) 优化周期代表的规模是有效的, (二) 优化周期代表基础的计算成本成本成本总是超过我们所考虑的固定基数, (三) 线性解算方法的选择是几乎是直线性解的周期, 时间计算过程的周期,而不是最大幅度的周期, 最精确的计算过程的周期是 。