A large-scale deep model pre-trained on massive labeled or unlabeled data transfers well to downstream tasks. Linear evaluation freezes parameters in the pre-trained model and trains a linear classifier separately, which is efficient and attractive for transfer. However, little work has investigated the classifier in linear evaluation except for the default logistic regression. Inspired by the statistical efficiency of naive Bayes, the paper revisits the classical topic on discriminative vs. generative classifiers. Theoretically, the paper considers the surrogate loss instead of the zero-one loss in analyses and generalizes the classical results from binary cases to multiclass ones. We show that, under mild assumptions, multiclass naive Bayes requires $O(\log n)$ samples to approach its asymptotic error while the corresponding multiclass logistic regression requires $O(n)$ samples, where $n$ is the feature dimension. To establish it, we present a multiclass $\mathcal{H}$-consistency bound framework and an explicit bound for logistic loss, which are of independent interests. Simulation results on a mixture of Gaussian validate our theoretical findings. Experiments on various pre-trained deep vision models show that naive Bayes consistently converges faster as the number of data increases. Besides, naive Bayes shows promise in few-shot cases and we observe the ``two regimes'' phenomenon in pre-trained supervised models. Our code is available at https://github.com/ML-GSAI/Revisiting-Dis-vs-Gen-Classifiers.
翻译:在大规模标签或未标签数据传输到下游任务方面,一个大型的深层次模型先受过训练,先经过大规模标签或未标签数据传输到下游任务; 线性评价在预培训模式中冻结参数,并单独培训线性分类器,这既有效,又具有转移吸引力; 然而,除了默认后勤回归外,几乎没有调查线性评估的分类器; 在天真的拜斯统计效率的启发下, 本文回顾了关于歧视与基因化分类的经典主题。 从理论上看, 本文考虑在分析中代谢损失而不是零一损失, 并将二元案例的典型结果概括到多级案例。 我们表明,在轻度假设下,多级天真贝斯需要美元样本来接近其无症状的错误,而相应的多级物流回归则需要$(n)样本, 其中美元是特征层面。 要确定, 我们展示的是多级的 $(mathcal call {H) 和明确的后勤损失框架,这是独立利益所在。 我们的模拟模型的模拟结果, 在高层次的模型中, 展示了我们最快速的精确的视野中, 展示了我们的最新数据。