Our paper aims to analyze political polarization in US political system using Language Models, and thereby help candidates make an informed decision. The availability of this information will help voters understand their candidates views on the economy, healthcare, education and other social issues. Our main contributions are a dataset extracted from Wikipedia that spans the past 120 years and a Language model based method that helps analyze how polarized a candidate is. Our data is divided into 2 parts, background information and political information about a candidate, since our hypothesis is that the political views of a candidate should be based on reason and be independent of factors such as birthplace, alma mater, etc. We further split this data into 4 phases chronologically, to help understand if and how the polarization amongst candidates changes. This data has been cleaned to remove biases. To understand the polarization we begin by showing results from some classical language models in Word2Vec and Doc2Vec. And then use more powerful techniques like the Longformer, a transformer based encoder, to assimilate more information and find the nearest neighbors of each candidate based on their political view and their background.
翻译:我们的文件旨在用语言模型分析美国政治体系的政治两极分化,从而帮助候选人做出知情的决定。 这些信息的提供将有助于选民理解其候选人对经济、医疗、教育和其他社会问题的看法。 我们的主要贡献是来自过去120年的维基百科的数据集,以及有助于分析候选人如何两极化的基于语言模型的方法。 我们的数据被分为两个部分、背景资料和候选人的政治信息,因为我们的假设是,候选人的政治观点应该基于理性,并且独立于出生地、alma Materr等因素。 我们进一步将这些数据按时间顺序分为四个阶段,以帮助理解候选人之间的两极化是否和如何改变。 这些数据已被清理,以消除偏见。 为了理解两极化,我们首先展示了Word2Vec和Doc2Vec中一些经典语言模型的结果。 然后使用更强大的技术,如Longfrener,一个基于变换器的编码器,来吸收更多的信息,并根据每个候选人的政治观点和背景寻找最近的邻居。