We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. In the first four methods, the classification is based on a reduction to binary classification. We consider the case where the binary classifier comes from a class of VC dimension $d$, and in particular from the class of halfspaces over $\reals^d$. We analyze both the estimation error and the approximation error of these methods. Our analysis reveals interesting conclusions of practical relevance, regarding the success of the different approaches under various conditions. Our proof technique employs tools from VC theory to analyze the \emph{approximation error} of hypothesis classes. This is in sharp contrast to most, if not all, previous uses of VC theory, which only deal with estimation error.
翻译:我们理论上分析和比较了以下五种受欢迎的多级分类方法:一对All、All Pairs、树类分类、随机生成代码矩阵的错误校正输出代码和多级 SVM。在前四种方法中,分类以二进制分类为基础。我们考虑了二进制分类器来自VC维度等级的美元,特别是来自美元以上半空类的美元。我们分析了这些方法的估计错误和近似错误。我们的分析揭示了不同方法在不同条件下成功的实际相关性的有趣结论。我们的证据技术使用了VC理论的工具来分析假设等级的模范差错。这与大多数(如果不是全部的话)VC理论的前用法截然不同,后者只处理估计错误。