Ultrametric matrices have a rich structure that is not apparent from their definition. Notably, the subclass of strictly ultrametric matrices are covariance matrices of certain weighted rooted binary trees. In applications, these matrices can be large and dense, making them difficult to store and handle. In this manuscript, we exploit the underlying tree structure of these matrices to sparsify them via a similarity transformation based on Haar-like wavelets. We show that, with overwhelmingly high probability, only an asymptotically negligible fraction of the off-diagonal entries in random but large strictly ultrametric matrices remain non-zero after the transformation; and develop a fast algorithm to compress such matrices directly from their tree representation. We also identify the subclass of matrices diagonalized by the wavelets and supply a sufficient condition to approximate the spectrum of strictly ultrametric matrices outside this subclass. Our methods give computational access to a covariance model of the microbiologists' Tree of Life, which was previously inaccessible due to its size, and motivate defining a new but wavelet-based phylogenetic $\beta$-diversity metric. Applying this metric to a metagenomic dataset demonstrates that it can provide novel insight into noisy high-dimensional samples and localize speciation events that may be most important in determining relationships between environmental factors and microbial composition.
翻译:超强矩阵具有从定义上看不出来的丰富结构。 值得注意的是, 严格超度矩阵的亚类是某些加权根基二进制树的共变基质。 在应用中, 这些矩阵可以是大而稠密的, 使得它们难以储存和处理。 在本手稿中, 我们利用这些矩阵的底底底树结构, 通过基于类似Haar的波盘的类似变形, 将它们封闭起来。 我们显示, 极有可能极有可能的是, 随机但大型严格超度矩阵的离对角条目中只有微小的一小部分, 在变异后仍然不为零; 并且 开发一种快速算法, 直接从树表中压缩这些矩阵。 我们还确定了由波盘分解的基质的亚类, 并提供了足够的条件, 以近距离矩阵模型外的光谱为大概范围。 我们的方法为微生物学家生命树的变异模型提供了计算途径, 由于其大小而以前无法进入, 并且激励定义一个新的但基于波质的血基系 $\betative 度测量度测量的测量要素。 将这种精确度关系转化为的模型, 和精确的模型的模型可以显示它之间的精确度关系。