Varieties of Democracy (V-Dem) is a new approach to conceptualizing and measuring democracy and politics. It has information for 200 countries and is one of the biggest databases for political science. According to the V-Dem annual democracy report 2019, Taiwan is one of the two countries that got disseminated false information from foreign governments the most. It also shows that the "made-up news" has caused a great deal of confusion in Taiwanese society and has serious impacts on global stability. Although there are several applications helping distinguish the false information, we found out that the pre-processing of categorizing the news is still done by human labor. However, human labor may cause mistakes and cannot work for a long time. The growing demands for automatic machines in the near decades show that while the machine can do as good as humans or even better, using machines can reduce humans' burden and cut down costs. Therefore, in this work, we build a predictive model to classify the category of news. The corpora we used contains 28358 news and 200 news scraped from the online newspaper Liberty Times Net (LTN) website and includes 8 categories: Technology, Entertainment, Fashion, Politics, Sports, International, Finance, and Health. At first, we use Bidirectional Encoder Representations from Transformers (BERT) for word embeddings which transform each Chinese character into a (1,768) vector. Then, we use a Long Short-Term Memory (LSTM) layer to transform word embeddings into sentence embeddings and add another LSTM layer to transform them into document embeddings. Each document embedding is an input for the final predicting model, which contains two Dense layers and one Activation layer. And each document embedding is transformed into 1 vector with 8 real numbers, then the highest one will correspond to the 8 news categories with up to 99% accuracy.
翻译:民主之花( V- Dem) 是概念化和衡量民主与政治的新方法。 它有200个国家的信息, 是政治科学的最大数据库之一。 根据 V- Dem 年度民主报告 2019, 台湾是传播外国政府虚假信息的两个国家之一。 它还表明“ 造新闻”在台湾社会造成了巨大的混乱, 对全球稳定产生了严重影响。 虽然有多种应用程序帮助区分错误信息, 但我们发现, 将新闻分类的预处理仍然由人类劳动完成。 但是, 人类劳动可能会造成错误, 并且无法长期工作。 根据 V- Dem 年度民主报告 2019, 台湾是传播错误信息最多的两个国家之一。 它还表明, 使用机器可以减少人的负担, 降低成本。 因此, 我们在这个工作中, 我们用一个预测模型来分类新闻类别。 我们用一个28358 种信息, 将存储文件转换成一个文件, 从网络( LTNTN) 网站( ) 将存储文件转换成一个文件。 但是, 人类劳动可能会造成错误, 无法长期工作。 对自动机器的需求需求增加一个类别,, 而机器可以像 人类, 服务器, 游戏, 游戏, 将一个模式, 滚动, 滚动。