从新闻文章中提取意见判决的新颖的两阶段框架 (A Novel Two-stage Framework for Extracting Opinionated Sentences from News Articles)

This paper presents a novel two-stage framework to extract opinionated sentences from a given news article. In the first stage, Naive Bayes classifier by utilizing the local features assigns a score to each sentence - the score signifies the probability of the sentence to be opinionated. In the second stage, we use this prior within the HITS (Hyperlink-Induced Topic Search) schema to exploit the global structure of the article and relation between the sentences. In the HITS schema, the opinionated sentences are treated as Hubs and the facts around these opinions are treated as the Authorities. The algorithm is implemented and evaluated against a set of manually marked data. We show that using HITS significantly improves the precision over the baseline Naive Bayes classifier. We also argue that the proposed method actually discovers the underlying structure of the article, thus extracting various opinions, grouped with supporting facts as well as other supporting opinions from the article.

翻译：本文提出了一个新的两阶段框架,从特定新闻文章中提取有见解的句子。在第一阶段, Naive Bayes 分类者通过利用当地特征给每个句子分配一个分数—— 分数表示该句子被观察的可能性。在第二阶段,我们在HITS(Hyperlink-引证主题搜索)中使用这一先期计划来利用文章的全球结构和判决之间的关系。在HITS schema中,有见解的句子被当作枢纽处理,这些意见周围的事实被当作当局处理。算法是根据一组人工标记的数据执行和评估的。我们表明,使用HITS大大提高了基线 Naive Bayes 分类器的精确度。我们还认为,拟议方法实际上发现了文章的基本结构,从而提取了各种意见,这些观点与支持性事实以及文章的其他支持性意见结合在一起。

相关内容

朴素贝叶斯分类器

关注 4

在机器学习中，朴素贝叶斯分类器是一系列以假设特征之间强（朴素）独立下运用贝叶斯定理为基础的简单概率分类器。朴素贝叶斯自20世纪50年代已广泛研究。在20世纪60年代初就以另外一个名称引入到文本信息检索界中，并仍然是文本分类的一种热门（基准）方法，文本分类是以词频为特征判断文件所属类别或其他（如垃圾邮件、合法性、体育或政治等等）的问题。通过适当的预处理，它可以与这个领域更先进的方法（包括支持向量机）相竞争。它在自动医疗诊断中也有应用

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【WWW2020-UIUC】自动主题分类法构建，Automated Topic Taxonomy Construction

专知会员服务

40+阅读 · 2020年3月22日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日