通过自预培训进行的半有效文本分类 (Semi-Supervised Text Classification via Self-Pretraining)

We present a neural semi-supervised learning model termed Self-Pretraining. Our model is inspired by the classic self-training algorithm. However, as opposed to self-training, Self-Pretraining is threshold-free, it can potentially update its belief about previously labeled documents, and can cope with the semantic drift problem. Self-Pretraining is iterative and consists of two classifiers. In each iteration, one classifier draws a random set of unlabeled documents and labels them. This set is used to initialize the second classifier, to be further trained by the set of labeled documents. The algorithm proceeds to the next iteration and the classifiers' roles are reversed. To improve the flow of information across the iterations and also to cope with the semantic drift problem, Self-Pretraining employs an iterative distillation process, transfers hypotheses across the iterations, utilizes a two-stage training model, uses an efficient learning rate schedule, and employs a pseudo-label transformation heuristic. We have evaluated our model in three publicly available social media datasets. Our experiments show that Self-Pretraining outperforms the existing state-of-the-art semi-supervised classifiers across multiple settings. Our code is available at https://github.com/p-karisani/self_pretraining.

翻译：我们提出了一个神经半监督的学习模式,称为自我准备。我们的模型是由经典自我培训算法启发的。然而,与自我培训相比,自我准备培训是没有门槛的,它有可能更新对先前标签的文件的信念,并能够应对语义漂流问题。自我准备培训是迭代的,由两个分类者组成。在每次迭代中,一个分类者抽出一组没有标签的文件并贴上标签。这套模型用于初始化第二个分类器,由标签文件组进一步培训。算法开始进入下一个迭代和叙级者的角色被颠倒。为了改善跨迭代的信息流动,并应对语义漂流问题,自我准备培训采用迭代的蒸馏过程,将假体转换成跨迭代的,使用两阶段培训模式,使用有效的学习进度表,并使用一个假标签变体。我们已经在三种公开的社会媒体数据集中评估了我们的模型。我们的实验显示,在多级/多级化的自我分析设置中,我们现有的自我分析系统。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/