In this paper, we propose a new framework for designing fast parallel algorithms for fundamental statistical subset selection tasks that include feature selection and experimental design. Such tasks are known to be weakly submodular and are amenable to optimization via the standard greedy algorithm. Despite its desirable approximation guarantees, the greedy algorithm is inherently sequential and in the worst case, its parallel runtime is linear in the size of the data. Recently, there has been a surge of interest in a parallel optimization technique called adaptive sampling which produces solutions with desirable approximation guarantees for submodular maximization in exponentially faster parallel runtime. Unfortunately, we show that for general weakly submodular functions such accelerations are impossible. The major contribution in this paper is a novel relaxation of submodularity which we call differential submodularity. We first prove that differential submodularity characterizes objectives like feature selection and experimental design. We then design an adaptive sampling algorithm for differentially submodular functions whose parallel runtime is logarithmic in the size of the data and achieves strong approximation guarantees. Through experiments, we show the algorithm's performance is competitive with state-of-the-art methods and obtains dramatic speedups for feature selection and experimental design problems.
翻译:在本文中,我们提出了设计基本统计子选择任务快速平行算法的新框架,其中包括特征选择和实验性设计。这些任务已知是薄弱的子模块,并且可以通过标准的贪婪算法加以优化。尽管有可取的近似保证,贪婪算法本质上是顺序的,而最坏的情况是,其平行运行时间是数据大小的线性。最近,对称为适应性抽样的平行优化技术的兴趣激增,该技术为在快速平行运行时极快的子模块最大化提供了理想的近似保证。不幸的是,我们表明,对于一般而言微弱的子模块函数来说,这种加速是不可能的。本文的主要贡献是亚模块性新颖的松动,我们称之为差异亚模块性。我们首先证明,差异的子模块性是特征选择和实验设计等目标的特征性。我们随后为不同亚模块性功能设计了适应性抽样算法,这些功能的平行运行时间是数据大小的逻辑性,并实现了强烈的近似性保证。我们通过实验显示,该算法的性表现与状态方法相比是竞争性的,并获得特征选择和实验性快速的速度问题。