We consider the problem of projecting a vector onto the so-called k-capped simplex, which is a hyper-cube cut by a hyperplane. For an n-dimensional input vector with bounded elements, we found that a simple algorithm based on Newton's method is able to solve the projection problem to high precision with a complexity roughly about O(n), which has a much lower computational cost compared with the existing sorting-based methods proposed in the literature. We provide a theory for partial explanation and justification of the method. We demonstrate that the proposed algorithm can produce a solution of the projection problem with high precision on large scale datasets, and the algorithm is able to significantly outperform the state-of-the-art methods in terms of runtime (about 6-8 times faster than a commercial software with respect to CPU time for input vector with 1 million variables or more). We further illustrate the effectiveness of the proposed algorithm on solving sparse regression in a bioinformatics problem. Empirical results on the GWAS dataset (with 1,500,000 single-nucleotide polymorphisms) show that, when using the proposed method to accelerate the Projected Quasi-Newton (PQN) method, the accelerated PQN algorithm is able to handle huge-scale regression problem and it is more efficient (about 3-6 times faster) than the current state-of-the-art methods.
翻译:我们考虑将矢量投射到所谓的K-Cappedformx上的问题,这是一个由高空飞机切割的超立方体。对于带有捆绑元素的正维输入矢量,我们发现基于牛顿方法的简单算法能够以对O(n)的复杂程度以高度精确的方式解决预测问题,而O(n)与文献中提议的基于分类的现有方法相比,O(n)的计算成本要低得多。我们为该方法的部分解释和解释提供了理论。我们证明,拟议的算法能够以大比例数据集的高度精确度为预测问题提供解决方案,而算法在运行时间方面大大超过最先进的计算方法(大约6-8倍于与具有100万变量或以上输入矢量的 CPPU时间有关的商业软件) 。我们进一步说明拟议的算法在解决生物信息学问题中的微缩回归方面的有效性。我们证明,GWAS数据集(1,500,000个单核极多元形态数据集的精度)的预测结果可以产生出一个解决方案,而算法在运行时间上大大超过最先进的Q-Q级分析方法,因此加速快速地处理Q-Q-Q-Q-Q-一个快速分析方法是加速快速的快速地分析方法。