We present a new perspective on the popular multi-class algorithmic techniques of one-vs-all and error correcting output codes. Rather than studying the behavior of these techniques for supervised learning, we establish a connection between the success of these methods and the existence of label-efficient learning procedures. We show that in both the realizable and agnostic cases, if output codes are successful at learning from labeled data, they implicitly assume structure on how the classes are related. By making that structure explicit, we design learning algorithms to recover the classes with low label complexity. We provide results for the commonly studied cases of one-vs-all learning and when the codewords of the classes are well separated. We additionally consider the more challenging case where the codewords are not well separated, but satisfy a boundary features condition that captures the natural intuition that every bit of the codewords should be significant.
翻译:我们对一五一和错误校正输出代码的流行多级算法技术提出了新的观点。我们不是研究这些技术的行为,而是研究这些技术在监督学习方面的行为,而是将这些方法的成功与存在标签效率高的学习程序联系起来。我们表明,在可变和不可知的情况下,如果产出代码成功地从标签数据中学习,它们就隐含地假定了各个类别之间的结构。通过明确这一结构,我们设计了学习算法,以恢复低标签复杂性的班级。我们为通常研究的一五一学习案例和班级的编码词完全分开时提供了结果。我们另外考虑了一个更具有挑战性的案例,即代码字没有很好地分开,但满足了一种边界特征的条件,它捕捉了自然直觉,即每个字的字眼都应该重要。