We study the problem of learning a mixture of two subspaces over $\mathbb{F}_2^n$. The goal is to recover the individual subspaces, given samples from a (weighted) mixture of samples drawn uniformly from the two subspaces $A_0$ and $A_1$. This problem is computationally challenging, as it captures the notorious problem of "learning parities with noise" in the degenerate setting when $A_1 \subseteq A_0$. This is in contrast to the analogous problem over the reals that can be solved in polynomial time (Vidal'03). This leads to the following natural question: is Learning Parities with Noise the only computational barrier in obtaining efficient algorithms for learning mixtures of subspaces over $\mathbb{F}_2^n$? The main result of this paper is an affirmative answer to the above question. Namely, we show the following results: 1. When the subspaces $A_0$ and $A_1$ are incomparable, i.e., $A_0$ and $A_1$ are not contained inside each other, then there is a polynomial time algorithm to recover the subspaces $A_0$ and $A_1$. 2. In the case when $A_1$ is a subspace of $A_0$ with a significant gap in the dimension i.e., $dim(A_1) \le \alpha dim(A_0)$ for $\alpha<1$, there is a $n^{O(1/(1-\alpha))}$ time algorithm to recover the subspaces $A_0$ and $A_1$. Thus, our algorithms imply computational tractability of the problem of learning mixtures of two subspaces, except in the degenerate setting captured by learning parities with noise.
翻译:我们研究的是在$mathbb{F\\\\\2美元上学习两个子空间的混合物的问题。 目标是从一个( 加权的) 样本中回收单个子空间, 样本来自从两个子空间统一抽取的样本的( 加权的) 美元A_ 0美元和$A_ 1美元。 这个问题在计算上具有挑战性, 因为它记录了在衰落的环境下“ 学习噪音的等量” 的臭名昭著问题, 当$A_ 1\ subsete A_ 0美元。 这与在多元时间( Vidal'03)中可以解决的实值的类似问题相反。 这导致以下自然问题: 学习分层( $A_ 0美元), 学习子空间的计算法只有 $1 美元=1 美元 。 我们显示的是以下两个结果: 当亚空间的 $0 和 $1 美元 美元 和 $1 美元 美元 等空间的 的次数在当时无法比较,, iA_ 美元 美元 和 美元 美元 美元 美元 美元 数字 内不是 数字 。 a_ a 内 。