We provide a minimax optimal estimation procedure for F and W in matrix valued linear models Y = F W + Z where the parameter matrix W and the design matrix F are unknown but the latter takes values in a known finite set. The proposed finite alphabet linear model is justified in a variety of applications, ranging from signal processing to cancer genetics. We show that this allows to separate F and W uniquely under weak identifiability conditions, a task which is not doable, in general. To this end we quantify in the noiseless case, that is, Z = 0, the perturbation range of Y in order to obtain stable recovery of F and W. Based on this, we derive an iterative Lloyd's type estimation procedure that attains minimax estimation rates for W and F for Gaussian error matrix Z. In contrast to the least squares solution the estimation procedure can be computed efficiently and scales linearly with the total number of observations. We confirm our theoretical results in a simulation study and illustrate it with a genetic sequencing data example.
翻译:我们为F和W提供了一种最小的最佳估计程序,其矩阵价值为线性模型Y=F W + Z,其中参数矩阵W和设计矩阵F未知,但后者的数值为已知的有限数组。提议的有限字母线性模型在从信号处理到癌症遗传学等各种应用中是合理的。我们表明,这允许在微弱的可识别性条件下将F和W单独区分开来,一般来说,这项任务是行不通的。为此,我们在无噪音案例中量化了Y的扰动范围,即Z=0,以便稳定恢复F和W。基于这一点,我们得出了反复的劳埃德类型估计程序,该程序达到高斯误差矩阵Z的W和F的微量估计率。与最小方的可辨识性解决方案相比,可以高效地计算估计程序,以观察总数为线性尺度。我们通过模拟研究来确认我们的理论结果,并以基因测序数据为例。