The task of predicting the publication period of text documents, such as news articles, is an important but less studied problem in the field of natural language processing. Predicting the year of a news article can be useful in various contexts, such as historical research, sentiment analysis, and media monitoring. In this work, we investigate the problem of predicting the publication period of a text document, specifically a news article, based on its textual content. In order to do so, we created our own extensive labeled dataset of over 350,000 news articles published by The New York Times over six decades. In our approach, we use a pretrained BERT model fine-tuned for the task of text classification, specifically for time period prediction.This model exceeds our expectations and provides some very impressive results in terms of accurately classifying news articles into their respective publication decades. The results beat the performance of the baseline model for this relatively unexplored task of time prediction from text.
翻译:暂无翻译