Multiple studies have focused on predicting the prospective popularity of an online document as a whole, without paying attention to the contributions of its individual parts. We introduce the task of proactively forecasting popularities of sentences within online news documents solely utilizing their natural language content. We model sentence-specific popularity forecasting as a sequence regression task. For training our models, we curate InfoPop, the first dataset containing popularity labels for over 1.7 million sentences from over 50,000 online news documents. To the best of our knowledge, this is the first dataset automatically created using streams of incoming search engine queries to generate sentence-level popularity annotations. We propose a novel transfer learning approach involving sentence salience prediction as an auxiliary task. Our proposed technique coupled with a BERT-based neural model exceeds nDCG values of 0.8 for proactive sentence-specific popularity forecasting. Notably, our study presents a non-trivial takeaway: though popularity and salience are different concepts, transfer learning from salience prediction enhances popularity forecasting. We release InfoPop and make our code publicly available: https://github.com/sayarghoshroy/InfoPopularity
翻译:多个研究侧重于预测整个在线文件的潜在受欢迎程度,而没有注意其各部分的贡献。我们引入了在网上新闻文件中主动预测判决普及程度的任务,只使用其自然语言内容。我们将特定句点的受欢迎程度预测模型作为顺序回归任务。为培训我们的模型,我们翻译了InfoPop,这是第一个包含超过50 000多份在线新闻文件170万句子的受欢迎性标签的数据集。据我们所知,这是第一个利用输入的搜索引擎查询流自动创建的数据集,以生成判刑程度的受欢迎性说明。我们建议采用新的转移学习方法,包括以句点预测为辅助任务。我们提议的技术加上基于BERT的神经模型超过0.8的NDCG值。值得注意的是,我们的研究提出了非三重取用概念:虽然受欢迎性和突出性是不同的,但从显著的预测中转学可以增强受欢迎性的预报。我们发布InfoPopulity,并公开提供我们的代码:https://github.com/sayeghhosh/InPopality: