Decision-making usually takes five steps: identifying the problem, collecting data, extracting evidence, identifying pro and con arguments, and making decisions. Focusing on extracting evidence, this paper presents a hybrid model that combines latent Dirichlet allocation and word embeddings to obtain external knowledge from structured and unstructured data. We study the task of sentence-level argument mining, as arguments mostly require some degree of world knowledge to be identified and understood. Given a topic and a sentence, the goal is to classify whether a sentence represents an argument in regard to the topic. We use a topic model to extract topic- and sentence-specific evidence from the structured knowledge base Wikidata, building a graph based on the cosine similarity between the entity word vectors of Wikidata and the vector of the given sentence. Also, we build a second graph based on topic-specific articles found via Google to tackle the general incompleteness of structured knowledge bases. Combining these graphs, we obtain a graph-based model which, as our evaluation shows, successfully capitalizes on both structured and unstructured data.
翻译:决策通常采取五个步骤:查明问题、收集数据、提取证据、确定赞成和反对论点以及作出决定。本文件以提取证据为重点,介绍了一种混合模型,将潜在的迪里赫特分配和文字嵌入结合起来,以便从结构化和无结构化的数据中获得外部知识。我们研究了判决层面的争论挖掘任务,因为争论大多需要一定程度的世界知识才能确定和理解。根据一个专题和一句,目标是对一个句子是否代表与本专题有关的论据进行分类。我们使用一个专题模型,从结构化知识库维基数据中提取专题和判决特定证据,根据维基数据实体矢量与给定句矢量之间的共弦相似性构建一个图表。此外,我们根据通过谷歌找到的专题文章建立第二个图表,以解决结构化知识基础的总体不完善问题。我们把这些图表合并起来,获得一个基于图表的模型,正如我们的评估所显示的那样,该模型成功地利用了结构化和非结构化的数据。