Error-correcting codes (ECC) are used to reduce multiclass classification tasks to multiple binary classification subproblems. In ECC, classes are represented by the rows of a binary matrix, corresponding to codewords in a codebook. Codebooks are commonly either predefined or problem dependent. Given predefined codebooks, codeword-to-class assignments are traditionally overlooked, and codewords are implicitly assigned to classes arbitrarily. Our paper shows that these assignments play a major role in the performance of ECC. Specifically, we examine similarity-preserving assignments, where similar codewords are assigned to similar classes. Addressing a controversy in existing literature, our extensive experiments confirm that similarity-preserving assignments induce easier subproblems and are superior to other assignment policies in terms of their generalization performance. We find that similarity-preserving assignments make predefined codebooks become problem-dependent, without altering other favorable codebook properties. Finally, we show that our findings can improve predefined codebooks dedicated to extreme classification.
翻译:错误更正代码( ECC) 用于减少多级分类任务到多个二进分类子问题。 在ECC 中, 分类由二进制矩阵行代表, 对应代码簿中的代码词。 代码簿通常不是预先定义, 就是有问题。 鉴于预定义的代码簿, 代码到分类的指派历来被忽视, 代码词被隐含地指定为任意的类别。 我们的文件显示, 这些任务在ECC 的运行中起着重要作用 。 具体地说, 我们检查类似代码词被分配到类似类的类似保存任务。 解决现有文献中的争议, 我们的广泛实验证实, 类似性保留任务会引出更容易的子问题, 并且在其一般化性能方面优于其他指定政策 。 我们发现, 类似性- 保留任务会让预定义的代码目录变得依赖问题, 而不会改变其他有利的代码簿属性 。 最后, 我们显示, 我们的发现可以改进用于极端分类的预定义代码簿 。