This article presents a dataset of 10,917 news articles with hierarchical news categories collected between January 1st 2019, and December 31st 2019. We manually labelled the articles based on a hierarchical taxonomy with 17 first-level and 109 second-level categories. This dataset can be used to train machine learning models for automatically classifying news articles by topic. This dataset can be helpful for researchers working on news structuring, classification, and predicting future events based on released news.
翻译:本篇文章提供了10 917篇新闻文章的数据集,这些文章包含2019年1月1日至2019年12月31日期间收集的等级级新闻类别。我们根据等级分类对文章进行了手工贴标签,分为17个第一级和109个二级。该数据集可用于培训机器学习模型,以便按主题自动对新闻文章进行分类。该数据集可以帮助研究人员研究新闻结构、分类和根据发布的消息预测未来事件。