Identifying independence between two random variables or correlated given their samples has been a fundamental problem in Statistics. However, how to do so in a space-efficient way if the number of states is large is not quite well-studied. We propose a new, simple counter matrix algorithm, which utilize hash functions and a compressed counter matrix to give an unbiased estimate of the $\ell_2$ independence metric. With $\mathcal{O}(\epsilon^{-4}\log\delta^{-1})$ (very loose bound) space, we can guarantee $1\pm\epsilon$ multiplicative error with probability at least $1-\delta$. We also provide a comparison of our algorithm with the state-of-the-art sketching of sketches algorithm and show that our algorithm is effective, and actually faster and at least 2 times more space-efficient.
翻译:确定两个随机变量之间的独立或联系是统计中的一个根本问题。 但是,如果国家数量众多,如何以空间效率高的方式确定这两个随机变量或相关变量之间的独立是统计中的一个根本问题。 但是,如果对州数量大,如何以空间效率高的方式确定。 我们提出一个新的简单的反矩阵算法,利用散列函数和压缩的反矩阵算法,对美元=2美元的独立度度度进行公正的估计。 $\mathcal{O}( epsilon ⁇ -4 ⁇ log\çelta} ) $( 非常松散的) 空间, 我们可以保证1\pm\ epsilon$的倍增差, 概率至少为 $\\ delta$。 我们还将我们的算法与最先进的素描图算法进行比较, 并表明我们的算法是有效的, 实际上速度更快,至少是空间效率的2倍。