We study the following problem: Given a variable of interest, we would like to find a best linear predictor for it by choosing a subset of $k$ relevant variables obeying a matroid constraint. This problem is a natural generalization of subset selection problems where it is necessary to spread observations amongst multiple different classes. We derive new, strengthened guarantees for this problem by improving the analysis of the residual random greedy algorithm and by developing a novel distorted local-search algorithm. To quantify our approximation guarantees, we refine the definition of weak submodularity by Das and Kempe and introduce the notion of an upper submodularity ratio, which we connect to the minimum $k$-sparse eigenvalue of the covariance matrix. More generally, we look at the problem of maximizing a set function $f$ with lower and upper submodularity ratio $\gamma$ and $\beta$ under a matroid constraint. For this problem, our algorithms have asymptotic approximation guarantee $1/2$ and $1-e^{-1}$ as the function is closer to being submodular. As a second application, we show that the Bayesian A-optimal design objective falls into our framework, leading to new guarantees for this problem as well.
翻译:我们研究以下问题:鉴于一个利益变量,我们想为它找到一个最好的线性预测器,方法是选择一个子数,即美元相关变量的子数,以服从机器人的制约。这是一个问题,是子子选择问题的自然普遍化,需要将观测分散在多个不同类别之间。我们通过改进对剩余随机贪婪算法的分析,并通过开发一种新颖的扭曲本地搜索算法,为这一问题找到新的、强化的保障。为了量化我们的近似保证,我们完善了Das和Kempe对微弱次模式性亚模式性的定义,并引入了高子模式比率的概念,我们将这一概念与共变异矩阵的最小值相连接。更一般地说,我们审视的是使设定的功能最大化的问题,即以较低和上亚型的亚型比值比值比值为$\gamma$和$\beta$。为了量化我们的近似保证,我们的算法有微调近似近似度保证1/2美元和1-e ⁇ -1美元,因为这个函数更接近于次模式。作为第二个应用程序,我们把Bayes-opan 问题引向新的目标框架。