This paper presents a novel two-stage framework to extract opinionated sentences from a given news article. In the first stage, Naive Bayes classifier by utilizing the local features assigns a score to each sentence - the score signifies the probability of the sentence to be opinionated. In the second stage, we use this prior within the HITS (Hyperlink-Induced Topic Search) schema to exploit the global structure of the article and relation between the sentences. In the HITS schema, the opinionated sentences are treated as Hubs and the facts around these opinions are treated as the Authorities. The algorithm is implemented and evaluated against a set of manually marked data. We show that using HITS significantly improves the precision over the baseline Naive Bayes classifier. We also argue that the proposed method actually discovers the underlying structure of the article, thus extracting various opinions, grouped with supporting facts as well as other supporting opinions from the article.
翻译:本文提出了一个新的两阶段框架,从特定新闻文章中提取有见解的句子。 在第一阶段, Naive Bayes 分类者通过利用当地特征给每个句子分配一个分数—— 分数表示该句子被观察的可能性。在第二阶段,我们在HITS(Hyperlink-引证主题搜索)中使用这一先期计划来利用文章的全球结构和判决之间的关系。在HITS schema中,有见解的句子被当作枢纽处理,这些意见周围的事实被当作当局处理。算法是根据一组人工标记的数据执行和评估的。我们表明,使用HITS大大提高了基线 Naive Bayes 分类器的精确度。我们还认为,拟议方法实际上发现了文章的基本结构,从而提取了各种意见,这些观点与支持性事实以及文章的其他支持性意见结合在一起。