Sparse Conditional Random Field (CRF) is a powerful technique in computer vision and natural language processing for structured prediction. However, solving sparse CRFs in large-scale applications remains challenging. In this paper, we propose a novel safe dynamic screening method that exploits an accurate dual optimum estimation to identify and remove the irrelevant features during the training process. Thus, the problem size can be reduced continuously, leading to great savings in the computational cost without sacrificing any accuracy on the finally learned model. To the best of our knowledge, this is the first screening method which introduces the dual optimum estimation technique -- by carefully exploring and exploiting the strong convexity and the complex structure of the dual problem -- in static screening methods to dynamic screening. In this way, we can absorb the advantages of both the static and dynamic screening methods and avoid their drawbacks. Our estimation would be much more accurate than those developed based on the duality gap, which contributes to a much stronger screening rule. Moreover, our method is also the first screening method in sparse CRFs and even structure prediction models. Experimental results on both synthetic and real-world datasets demonstrate that the speedup gained by our method is significant.
翻译:简单条件随机场(CRF)是计算机视觉和自然语言处理系统化预测的有力技术。然而,在大规模应用中,解决分散的通用报告格式问题仍具有挑战性。在本文中,我们提出一种新的安全动态筛选方法,利用准确的双重最佳估计,在培训过程中查明和消除不相干的特点。因此,问题的规模可以不断缩小,从而在不牺牲最后获得的模型的任何准确性的情况下,大大节省计算费用。据我们所知,这是在静态筛选方法到动态筛选中,通过仔细探索和利用双重问题的强稳和复杂结构,引入双重最佳估算技术的第一种筛选方法。这样,我们可以吸收静态和动态筛选方法的优势,避免其缺陷。我们的估算将比基于双重性差距而开发的精确得多,这有助于形成更强大的筛选规则。此外,我们的方法也是在稀疏的通用报告格式甚至结构预测模型中的第一个筛选方法。合成和现实世界数据集的实验结果表明,我们方法获得的速度是显著的。