Even though fine-tuned neural language models have been pivotal in enabling "deep" automatic text analysis, optimizing text representations for specific applications remains a crucial bottleneck. In this study, we look at this problem in the context of a task from computational social science, namely modeling pairwise similarities between political parties. Our research question is what level of structural information is necessary to create robust text representation, contrasting a strongly informed approach (which uses both claim span and claim category annotations) with approaches that forgo one or both types of annotation with document structure-based heuristics. Evaluating our models on the manifestos of German parties for the 2021 federal election. We find that heuristics that maximize within-party over between-party similarity along with a normalization step lead to reliable party similarity prediction, without the need for manual annotation.
翻译:尽管微调的神经语言模型对于进行“深入”自动文本分析至关重要,但优化特定应用的文本表述仍是一个关键的瓶颈。 在本研究中,我们在计算社会科学任务的背景下审视这一问题,即政党之间的对等相似性建模。 我们的研究问题是,建立强有力的文本表述所需的结构信息水平是多少? 一种高度知情的方法(既使用索赔范围,又使用索赔类别说明)与一种或两种类型的注释都采用基于文件结构的超自然学方法形成对比。 评估2021年联邦选举德国政党宣言的模型。 我们发现,在不需要人工注解的情况下,在党内实现党内相似性最大化,同时采取常规步骤引导可靠的政党相似性预测,而不需要人工注解。