The Matrix-based Renyi's entropy enables us to directly measure information quantities from given data without the costly probability density estimation of underlying distributions, thus has been widely adopted in numerous statistical learning and inference tasks. However, exactly calculating this new information quantity requires access to the eigenspectrum of a semi-positive definite (SPD) matrix $A$ which grows linearly with the number of samples $n$, resulting in a $O(n^3)$ time complexity that is prohibitive for large-scale applications. To address this issue, this paper takes advantage of stochastic trace approximations for matrix-based Renyi's entropy with arbitrary $\alpha \in R^+$ orders, lowering the complexity by converting the entropy approximation to a matrix-vector multiplication problem. Specifically, we develop random approximations for integer order $\alpha$ cases and polynomial series approximations (Taylor and Chebyshev) for non-integer $\alpha$ cases, leading to a $O(n^2sm)$ overall time complexity, where $s,m \ll n$ denote the number of vector queries and the polynomial order respectively. We theoretically establish statistical guarantees for all approximation algorithms and give explicit order of s and m with respect to the approximation error $\varepsilon$, showing optimal convergence rate for both parameters up to a logarithmic factor. Large-scale simulations and real-world applications validate the effectiveness of the developed approximations, demonstrating remarkable speedup with negligible loss in accuracy.
翻译:以母体为基础的 Renyi 的星盘能让我们直接测量从特定数据中获取的信息量,而没有对基础分布进行昂贵的概率密度估计,因此在众多的统计学习和推断任务中广泛采用。然而,精确计算这一新的信息量需要使用半阳性确定(SPD)矩阵的微光度,半正数(SPD)矩阵美元随着样本数量的增加而线性增长,从而导致对大规模应用来说,时间复杂性为1 O(n)3)美元,因此无法进行大规模应用。为了解决这一问题,本文件利用了对基于基底分布的基底Reny的星盘,任意地用 $\alpha = r+ 美元订单,从而降低复杂性,通过将正数近似值转换成矩阵-摄取倍增量问题。具体地,我们为整数 $alpha 和 多边序列的近似值(Taylorlor和Chebyshevyalal), 导致以美元总时序值的微值缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩图。