Author profiling classifies author characteristics by analyzing how language is shared among people. In this work, we study that task from a low-resource viewpoint: using little or no training data. We explore different zero and few-shot models based on entailment and evaluate our systems on several profiling tasks in Spanish and English. In addition, we study the effect of both the entailment hypothesis and the size of the few-shot training sample. We find that entailment-based models out-perform supervised text classifiers based on roberta-XLM and that we can reach 80% of the accuracy of previous approaches using less than 50\% of the training data on average.
翻译:作者特征分析分析人与人之间如何共享语言,从而将作者特征分类。 在这项工作中,我们从低资源角度研究这项任务:使用很少或没有培训数据。我们探索基于要求的不同零和零光模型,并评估我们用西班牙语和英语进行的若干特征分析任务的系统。此外,我们研究隐含假设的影响和少见培训样本的大小。我们发现,基于要求的模型比基于roberta-XLM的监管文本分类系统要好,而且我们可以平均使用不到50 ⁇ 的培训数据达到先前方法准确度的80%。