Using a taxonomy to organize information requires classifying objects (documents, images, etc) with appropriate taxonomic classes. The flexible nature of zero-shot learning is appealing for this task because it allows classifiers to naturally adapt to taxonomy modifications. This work studies zero-shot multi-label document classification with fine-tuned language models under realistic taxonomy expansion scenarios in the human resource domain. Experiments show that zero-shot learning can be highly effective in this setting. When controlling for training data budget, zero-shot classifiers achieve a 12% relative increase in macro-AP when compared to a traditional multi-label classifier trained on all classes. Counterintuitively, these results suggest in some settings it would be preferable to adopt zero-shot techniques and spend resources annotating more documents with an incomplete set of classes, rather than spreading the labeling budget uniformly over all classes and using traditional classification techniques. Additional experiments demonstrate that adopting the well-known filter/re-rank decomposition from the recommender systems literature can significantly reduce the computational burden of high-performance zero-shot classifiers, empirically resulting in a 98% reduction in computational overhead for only a 2% relative decrease in performance. The evidence presented here demonstrates that zero-shot learning has the potential to significantly increase the flexibility of taxonomies and highlights directions for future research.
翻译:使用分类学来组织信息, 需要用适当的分类分类分类分类分类对对象( 文件、 图像等) 进行分类。 零点学习的灵活性质对于这项任务具有吸引力, 因为它允许分类者自然地适应分类学的修改。 这项工作研究在现实的分类扩展情景下, 在人力资源领域, 使用精确调整的语言模型, 研究零点多标签文件分类, 并使用精确调整的语言模型 。 实验表明, 零点学习在这种环境下非常有效 。 在控制培训数据预算时, 零点分类员在宏观- AP 中相对增加12%, 与对所有课程进行训练的传统多标签分类员相比。 反直觉而言, 这些结果表明, 在某些环境下, 最好采用零点技术, 并用资源来说明更多的文件, 使用不完全的分类法, 而不是使用传统的分类技术, 将标签预算统一分布在所有类别中。 额外的实验表明, 采用众所周知的过滤器/ 重新排序脱钩法, 文献可以显著地减少高表现零点的分类的计算负担, 实证导致98 % 计算间接费用的减少 。 这里显示, 学习方向 仅显示 2% 学习 的成绩 。