The recently developed matrix based Renyi's entropy enables measurement of information in data simply using the eigenspectrum of symmetric positive semi definite (PSD) matrices in reproducing kernel Hilbert space, without estimation of the underlying data distribution. This intriguing property makes the new information measurement widely adopted in multiple statistical inference and learning tasks. However, the computation of such quantity involves the trace operator on a PSD matrix $G$ to power $\alpha$(i.e., $tr(G^\alpha)$), with a normal complexity of nearly $O(n^3)$, which severely hampers its practical usage when the number of samples (i.e., $n$) is large. In this work, we present computationally efficient approximations to this new entropy functional that can reduce its complexity to even significantly less than $O(n^2)$. To this end, we first develop randomized approximations to $\tr(\G^\alpha)$ that transform the trace estimation into matrix-vector multiplications problem. We extend such strategy for arbitrary values of $\alpha$ (integer or non-integer). We then establish the connection between the matrix-based Renyi's entropy and PSD matrix approximation, which enables us to exploit both clustering and block low-rank structure of $\G$ to further reduce the computational cost. We theoretically provide approximation accuracy guarantees and illustrate the properties of different approximations. Large-scale experimental evaluations on both synthetic and real-world data corroborate our theoretical findings, showing promising speedup with negligible loss in accuracy.
翻译:最近开发的基于Renyi 的矩阵,使得能够对数据中的信息进行测量,只是利用对正半确定(PSD)矩阵的对称光谱来复制内核Hilbert空间,而没有估算基本数据分布。这种令人感兴趣的属性使得新的信息测量在多重统计推理和学习任务中广泛采用。然而,这种数量的计算涉及在私营部门司信息矩阵上的追踪操作者$G$至1美元(即,美元(Gääalpha)美元),通常的复杂度近于1美元(n)3美元,这在样本数量(即,美元)巨大时严重妨碍其实际使用。在这项工作中,我们为这种新的英特效功能提供了计算效率的近似值,可以将其复杂性大大降低到甚至低于$O(n2美元)美元。为此,我们首先将随机近似近似的近似比值发展到$(Gääal)美元(Galphrapha),将跟踪估测算结果转换成基体上基数值的准确度(noria-ral-ral-rational-rational-rational-rational-rational-rationalalalalal-reval-rational-rationallial-ral-lationaltialalalal-lupalalalal-lisal-l)),我们提供该等数据,我们为美元,我们将建立直译为美元和正正正正正正正数的直基数据基)和正数(我们为美元)和正基数基数的直基数据基数据基数据基数据基数据基数据基数据基数据基数据基数据基),我们建立直基数据基数据的任意基数据基数据基数据基数据基数据基数(我们制的任意基数-Ixxxx。。