Building models to detect vaccine attitudes on social media is challenging because of the composite, often intricate aspects involved, and the limited availability of annotated data. Existing approaches have relied heavily on supervised training that requires abundant annotations and pre-defined aspect categories. Instead, with the aim of leveraging the large amount of unannotated data now available on vaccination, we propose a novel semi-supervised approach for vaccine attitude detection, called VADet. A variational autoencoding architecture based on language models is employed to learn from unlabelled data the topical information of the domain. Then, the model is fine-tuned with a few manually annotated examples of user attitudes. We validate the effectiveness of VADet on our annotated data and also on an existing vaccination corpus annotated with opinions on vaccines. Our results show that VADet is able to learn disentangled stance and aspect topics, and outperforms existing aspect-based sentiment analysis models on both stance detection and tweet clustering.
翻译:在社交媒体上建立检测疫苗态度的模型具有挑战性,因为所涉及的综合、往往错综复杂的方面以及附加说明的数据有限。现有的方法在很大程度上依赖需要大量说明和预先界定的方面分类的监督培训。相反,为了利用目前大量关于疫苗的未加说明的数据,我们建议采用新的半监督方法来检测疫苗态度,称为VADet。基于语言模型的变式自动编码结构被用来从未加标记的数据中学习该领域的时事信息。然后,该模型经过一些人工的附加说明的用户态度实例的微调。我们验证了VADet对我们附加说明的数据的有效性,以及现有一个配有疫苗意见的疫苗接种材料的有效性。我们的结果显示,VADet能够学习分解的立场和侧面主题,并超越了有关立点探测和推特组合的现有基于侧面的情绪分析模型。