Few-Shot Text Classification (FSTC) imitates humans to learn a new text classifier efficiently with only few examples, by leveraging prior knowledge from historical tasks. However, most prior works assume that all the tasks are sampled from a single data source, which cannot adapt to real-world scenarios where tasks are heterogeneous and lie in different distributions. As such, existing methods may suffer from their globally knowledge-shared mechanisms to handle the task heterogeneity. On the other hand, inherent task relation are not explicitly captured, making task knowledge unorganized and hard to transfer to new tasks. Thus, we explore a new FSTC setting where tasks can come from a diverse range of data sources. To address the task heterogeneity, we propose a self-supervised hierarchical task clustering (SS-HTC) method. SS-HTC not only customizes cluster-specific knowledge by dynamically organizing heterogeneous tasks into different clusters in hierarchical levels but also disentangles underlying relations between tasks to improve the interpretability. Extensive experiments on five public FSTC benchmark datasets demonstrate the effectiveness of SS-HTC.
翻译:很少的热文本分类(FSTC)模仿人类,通过利用历史任务先前的知识,利用少数例子,有效地学习新的文本分类器。然而,大多数先前的工作假设所有任务都是从单一的数据源抽样的,无法适应任务多种多样、分布不同的现实世界情景。因此,现有方法可能因全球知识共享机制而受损,而处理任务差异性。另一方面,没有明确抓住固有的任务关系,使任务知识没有组织,难以转移到新的任务。因此,我们探索新的FSSTC设置,任务可以来自各种各样的数据源。为了解决任务差异性,我们建议采用自监督的等级任务组合(SS-HTC)方法。SS-HTC不仅通过动态地将混杂任务组织成不同层次的分组,而且将任务之间的关系分解为改进解释性的基础。在五个公共的FSS-HTC基准数据集上的广泛实验显示了SS-HTC的有效性。