Personal knowledge bases (PKBs) are crucial for a broad range of applications such as personalized recommendation and Web-based chatbots. A critical challenge to build PKBs is extracting personal attribute knowledge from users' conversation data. Given some users of a conversational system, a personal attribute and these users' utterances, our goal is to predict the ranking of the given personal attribute values for each user. Previous studies often rely on a relative number of resources such as labeled utterances and external data, yet the attribute knowledge embedded in unlabeled utterances is underutilized and their performance of predicting some difficult personal attributes is still unsatisfactory. In addition, it is found that some text classification methods could be employed to resolve this task directly. However, they also perform not well over those difficult personal attributes. In this paper, we propose a novel framework PEARL to predict personal attributes from conversations by leveraging the abundant personal attribute knowledge from utterances under a low-resource setting in which no labeled utterances or external data are utilized. PEARL combines the biterm semantic information with the word co-occurrence information seamlessly via employing the updated prior attribute knowledge to refine the biterm topic model's Gibbs sampling process in an iterative manner. The extensive experimental results show that PEARL outperforms all the baseline methods not only on the task of personal attribute prediction from conversations over two data sets, but also on the more general weakly supervised text classification task over one data set.
翻译:个人知识基础 (PKBs) 对个人化建议和基于网络的聊天室等广泛的应用至关重要。 建立 PKBs 的关键挑战是如何从用户的谈话数据中提取个人属性知识。 鉴于对口系统的某些用户、个人属性和这些用户的言论,我们的目标是预测给每个用户给定的个人属性值的排序。 以前的研究往往依赖诸如贴标签的言词和外部数据等相对数量的资源, 但是未贴标签的言词中所含的属性知识没有得到充分利用, 而它们预测某些困难个人属性的性能仍然不尽人意。 此外,还发现有些文本分类方法可以直接解决这项任务。 但是,他们的表现也远远不及于这些困难的个人属性。 在本文件中,我们提出了一个新的框架PEARL框架, 利用在低资源环境中的大量个人属性知识来预测个人属性知识,在这种低资源环境中,只使用贴标签的言词或外部数据分类。 PEARL 将双义的词性拼写信息信息与共同理解性个人属性的功能结合起来, 在一次共同理解性对话中, 更新了所有个人定义的基调的基数 。