This work presents a supervised method for generating a classifier model of the stances held by Chinese-speaking politicians and other Twitter users. Many previous works of political tweets prediction exist on English tweets, but to the best of our knowledge, this is the first work that builds prediction model on Chinese political tweets. It firstly collects data by scraping tweets of famous political figure and their related users. It secondly defines the political spectrum in two groups: the group that shows approvals to the Chinese Communist Party and the group that does not. Since there are not space between words in Chinese to identify the independent words, it then completes segmentation and vectorization by Jieba, a Chinese segmentation tool. Finally, it trains the data collected from political tweets and produce a classification model with high accuracy for understanding users' political stances from their tweets on Twitter.
翻译:这项工作为生成讲中文的政治家和其他推特用户所持立场的分类模型提供了一种受监督的方法。 以往许多政治推文预测工作都存在于英语推文上,但据我们所知,这是首次建立中国政治推文预测模型的工作。 首先,它通过收集著名政治人物及其相关用户的推文来收集数据。 其次,它界定了两个集团的政治范围:向中国共产党和不支持中国共产党的团体。 由于中文的文字之间没有确定独立单词的空间,它随后完成中国分割工具Jieba的分解和传导。 最后,它培训从政治推文中收集的数据,并制作一个非常精确的分类模型,以了解用户在推特上的推文的政治立场。