We present FewShotTextGCN, a novel method designed to effectively utilize the properties of word-document graphs for improved learning in low-resource settings. We introduce K-hop Neighbourhood Regularization, a regularizer for heterogeneous graphs, and show that it stabilizes and improves learning when only a few training samples are available. We furthermore propose a simplification in the graph-construction method, which results in a graph that is $\sim$7 times less dense and yields better performance in little-resource settings while performing on par with the state of the art in high-resource settings. Finally, we introduce a new variant of Adaptive Pseudo-Labeling tailored for word-document graphs. When using as little as 20 samples for training, we outperform a strong TextGCN baseline with 17% in absolute accuracy on average over eight languages. We demonstrate that our method can be applied to document classification without any language model pretraining on a wide range of typologically diverse languages while performing on par with large pretrained language models.
翻译:我们展示了WhotsShotTextGCN, 这是一种新颖的方法,旨在有效利用文字文档图表的特性,在低资源环境下改进学习。我们引入了K-hop Bridge Recisization,这是一个用于各种图表的常规化工具,并显示它稳定并改进学习,只要只有少量培训样本即可。我们进一步提议了图形构建方法的简化,其结果是一个图表,其密度比小7倍,在小资源环境中产生更好的性能,同时在高资源环境中与时尚水平相当。最后,我们引入了适应性普鲁多-Labebeing的新变体,专门为文字文档图形定制。在仅使用20个样本用于培训时,我们优于强大的文本GCN基线,平均超过8种语言的绝对精确率为17%。我们证明,我们的方法可以用于文件分类,无需语言模型对多种类型语言进行广泛的预先培训,而同时进行大型预先培训语言模型。