Consider a tree $T=(V,E)$ with root $\circ$ and edge length function $\ell:E\to\mathbb{R}_+$. The phylogenetic covariance matrix of $T$ is the matrix $C$ with rows and columns indexed by $L$, the leaf set of $T$, with entries $C(i,j):=\sum_{e\in[i\wedge j,o]}\ell(e)$, for each $i,j\in L$. Recent work [15] has shown that the phylogenetic covariance matrix of a large, random binary tree $T$ is significantly sparsified with overwhelmingly high probability under a change-of-basis with respect to the so-called Haar-like wavelets of $T$. This finding notably enables manipulating the spectrum of covariance matrices of large binary trees without the necessity to store them in computer memory but instead performing two post-order traversals of the tree. Building on the methods of [15], this manuscript further advances their sparsification result to encompass the broader class of $k$-regular trees, for any given $k\ge2$. This extension is achieved by refining existing asymptotic formulas for the mean and variance of the internal path length of random $k$-regular trees, utilizing hypergeometric function properties and identities.
翻译:暂无翻译