Parliamentary transcripts provide a valuable resource to understand the reality and know about the most important facts that occur over time in our societies. Furthermore, the political debates captured in these transcripts facilitate research on political discourse from a computational social science perspective. In this paper we release the first version of a newly compiled corpus from Basque parliamentary transcripts. The corpus is characterized by heavy Basque-Spanish code-switching, and represents an interesting resource to study political discourse in contrasting languages such as Basque and Spanish. We enrich the corpus with metadata related to relevant attributes of the speakers and speeches (language, gender, party...) and process the text to obtain named entities and lemmas. The obtained metadata is then used to perform a detailed corpus analysis which provides interesting insights about the language use of the Basque political representatives across time, parties and gender.
翻译:议会记录誊本为了解现实和了解我们社会长期发生的最重要的事实提供了宝贵的资源。此外,这些记录誊本中的政治辩论有助于从计算社会科学的角度研究政治话语。我们在本文件中公布了巴斯克议会记录誊本中新汇编的《汇编》第一版。《汇编》的特点是用巴斯克语和西班牙语等不同语言进行沉重的巴斯克语和西班牙语编码转换,是研究政治话语的有趣资源。我们用与演讲和演讲的相关属性(语言、性别、政党.)有关的元数据丰富了《汇编》,并处理文本,以获得命名实体和列姆马斯语。随后,获得的元数据被用于进行详细的材料分析,就巴斯克族政治代表在不同时间、政党和性别使用语言的情况提供有趣的见解。