The inference of politically-charged information from text data is a popular research topic in Natural Language Processing (NLP) at both text- and author-level. In recent years, studies of this kind have been implemented with the aid of representations from transformers such as BERT. Despite considerable success, however, we may ask whether results may be improved even further by combining transformed-based models with additional knowledge representations. To shed light on this issue, the present work describes a series of experiments to compare alternative model configurations for political inference from text in both English and Portuguese languages. Results suggest that certain text representations - in particular, the combined use of BERT pre-trained language models with a syntactic dependency model - may outperform the alternatives across multiple experimental settings, making a potentially strong case for further research in the use of heterogeneous text representations in these and possibly other NLP tasks.
翻译:从文本数据中推断出具有政治根据的信息,这是在文本和作者一级自然语言处理(NLP)的一个流行研究课题,近年来,在诸如BERT等变压器的代表的协助下,开展了这类研究。尽管取得了相当大的成功,但我们可能会问,通过将基于转变的模型与更多的知识表述结合起来,是否可以进一步改进结果。为阐明这一问题,目前的工作介绍了一系列实验,以比较从英文和葡萄牙文文本中政治推断的替代模型配置。结果显示,某些文本表述,特别是BERT预先培训的语言模型与综合依赖模型的结合使用,可能超越多种实验环境中的替代方法,从而有可能有力地证明进一步研究在这些以及可能的其他NLP任务中使用不同文本表述的方法。