We begin the study of list-decodable linear regression using batches. In this setting only an $\alpha \in (0,1]$ fraction of the batches are genuine. Each genuine batch contains $\ge n$ i.i.d. samples from a common unknown distribution and the remaining batches may contain arbitrary or even adversarial samples. We derive a polynomial time algorithm that for any $n\ge \tilde \Omega(1/\alpha)$ returns a list of size $\mathcal O(1/\alpha^2)$ such that one of the items in the list is close to the true regression parameter. The algorithm requires only $\tilde{\mathcal{O}}(d/\alpha^2)$ genuine batches and works under fairly general assumptions on the distribution. The results demonstrate the utility of batch structure, which allows for the first polynomial time algorithm for list-decodable regression, which may be impossible for the non-batch setting, as suggested by a recent SQ lower bound \cite{diakonikolas2021statistical} for the non-batch setting.
翻译:我们开始使用批量来研究列表可辨别线性回归。 在此设置中, 批量中只有 $\ alpha\ in ( 0, 1) 的分数是真实的 。 每个真实批量包含 $\ g n i. d. 的样本, 其余批量可能包含任意的甚或对抗的样本。 我们得出一个多元时间算法, 对任何 $\ ge\ tilde\ Omega( 1/\ alpha) 来说, 它返回一个大小为 $\ mathcal O( 1/\ alpha2) 的列表 列表 $ (1/\ alpha2) 的列表, 这样列表中的项目之一接近于真实的回归参数 。 算法只需要 $\ tilde\ mathcal{ O\ ( d/\\\\ pha2) $ 真正的批量, 并在分布的相当一般的假设下工作 。 结果显示了批量结构的效用, 允许第一个多位时间算算算的列表可衰减的回归算法, 这对于非批次的设置来说可能是不可能的, 无法进行非批量设置, 正如 2021 设置 。