Often clickbait articles have a title that is phrased as a question or vague teaser that entices the user to click on the link and read the article to find the explanation. We developed a system that will automatically find the answer or explanation of the clickbait hook from the website text so that the user does not need to read through the text themselves. We fine-tune an extractive question and answering model (RoBERTa) and an abstractive one (T5), using data scraped from the 'StopClickbait' Facebook pages and Reddit's 'SavedYouAClick' subforum. We find that both extractive and abstractive models improve significantly after finetuning. We find that the extractive model performs slightly better according to ROUGE scores, while the abstractive one has a slight edge in terms of BERTscores.
翻译:通常点击bait 文章的标题用一个问题或模糊的嘲弄器来形容,吸引用户点击链接并阅读文章以找到解释。 我们开发了一个系统, 将自动从网站文本中找到对点击诱饵的答案或解释, 这样用户就不需要通过文本本身阅读。 我们用从“ StopClickbait” Facebook 页面和 Redddit 的“ SaveedYouAClick” 子论坛中提取的数据来微调一个抽取问题和回答模型( ROBERTA) 和一个抽象的模型( T5 ) 。 我们发现, 在微调后, 采掘模型和抽象模型都有显著改善。 我们发现, 根据 ROUGE 评分, 采掘模型的运行效果略好一些, 而抽象模型则在 BERT 计分上略有优势 。