In this paper we study the problem of multiclass classification with a bounded number of different labels $k$, in the realizable setting. We extend the traditional PAC model to a) distribution-dependent learning rates, and b) learning rates under data-dependent assumptions. First, we consider the universal learning setting (Bousquet, Hanneke, Moran, van Handel and Yehudayoff, STOC '21), for which we provide a complete characterization of the achievable learning rates that holds for every fixed distribution. In particular, we show the following trichotomy: for any concept class, the optimal learning rate is either exponential, linear or arbitrarily slow. Additionally, we provide complexity measures of the underlying hypothesis class that characterize when these rates occur. Second, we consider the problem of multiclass classification with structured data (such as data lying on a low dimensional manifold or satisfying margin conditions), a setting which is captured by partial concept classes (Alon, Hanneke, Holzman and Moran, FOCS '21). Partial concepts are functions that can be undefined in certain parts of the input space. We extend the traditional PAC learnability of total concept classes to partial concept classes in the multiclass setting and investigate differences between partial and total concepts.
翻译:在本文中,我们研究了在可实现的环境下,以不同标签的限定数量以美元为单位的多级分类问题。我们把传统的PAC模式扩大到(a)基于分布的学习率,和(b)基于数据依据假设的学习率。首先,我们考虑了普遍学习(Boussquet、Hanneke、Moran、van Handel和Yehudayoff、STOC'21)的问题,为此,我们提供了每个固定分布的可实现的学习率的完整特征描述。特别是,我们展示了以下三分法:对于任何概念类别而言,最佳学习率要么是指数性的,要么线性,要么是任意性的。此外,我们为这些比率出现时所特有的基本假设类别提供了复杂的衡量尺度。第二,我们考虑了结构化数据(例如低维、多元或满意的边距条件下的数据)的多级分类问题,这是由部分概念类别(Alon、Hanneke、Holzman和Moran,FOCS'21)所捕捉到的。部分概念是无法在输入空间的某些部分中界定的功能。我们把传统的PACCA部分概念类别和部分概念类别中的全部概念范围扩大到扩大到扩大到了部分类别之间的部分分类。我们将整个概念分类扩大到部分分类。