Embedding approaches have become one of the most pervasive techniques for multi-label classification. However, the training process of embedding methods usually involves a complex quadratic or semidefinite programming problem, or the model may even involve an NP-hard problem. Thus, such methods are prohibitive on large-scale applications. More importantly, much of the literature has already shown that the binary relevance (BR) method is usually good enough for some applications. Unfortunately, BR runs slowly due to its linear dependence on the size of the input data. The goal of this paper is to provide a simple method, yet with provable guarantees, which can achieve competitive performance without a complex training process. To achieve our goal, we provide a simple stochastic sketch strategy for multi-label classification and present theoretical results from both algorithmic and statistical learning perspectives. Our comprehensive empirical studies corroborate our theoretical findings and demonstrate the superiority of the proposed methods.
翻译:嵌入方法已成为多标签分类的最普遍技术之一,然而,嵌入方法的培训过程通常涉及复杂的二次或半无限期的编程问题,或模型甚至可能涉及NP硬性问题。因此,这类方法对大规模应用来说是令人望而却步的。更重要的是,许多文献已经表明,二进制(BR)方法对于某些应用来说通常足够好。不幸的是,BR运行缓慢,因为它的线性依赖输入数据的规模。本文的目标是提供一个简单的方法,但有可验证的保证,在没有复杂的培训过程的情况下,可以取得竞争性的性能。为了实现我们的目标,我们为多标签分类提供了简单、随机的草图战略,并从算法和统计学习的角度提出了理论结果。我们的全面经验研究证实了我们的理论发现,并展示了拟议方法的优势。