We approach the classification problem as an entailment problem and apply zero-shot ranking to socio-political texts. Documents that are ranked at the top can be considered positively classified documents and this reduces the close reading time for the information extraction process. We use Transformer Language Models to get the entailment probabilities and investigate different types of queries. We find that DeBERTa achieves higher mean average precision scores than RoBERTa and when declarative form of the class label is used as a query, it outperforms dictionary definition of the class label. We show that one can reduce the close reading time by taking some percentage of the ranked documents that the percentage depends on how much recall they want to achieve. However, our findings also show that percentage of the documents that should be read increases as the topic gets broader.
翻译:我们把分类问题作为必然问题处理,对社会政治文本采用零分等级。排名最高的文件可以被视为积极的机密文件,这样可以缩短信息提取过程的近距离阅读时间。我们使用变换语言模型来获得必然概率并调查不同类型的查询。我们发现德贝塔的平均平均精确分数高于罗贝塔,而且当使用分类标签的宣示形式作为查询时,它优于类标签的字典定义。我们显示,通过将百分比取决于它们想要记住多少的排名文件的百分比来减少接近阅读的时间。然而,我们的调查结果还显示,随着主题的扩大,应该阅读的文件的百分比会增加。