Inspired by the notion of Center of Mass in physics, an extension called Semantic Center of Mass (SCOM) is proposed, and used to discover the abstract "topic" of a document. The notion is under a framework model called Understanding Map Supervised Topic Model (UM-S-TM). The devise aim of UM-S-TM is to let both the document content and a semantic network -- specifically, Understanding Map -- play a role, in interpreting the meaning of a document. Based on different justifications, three possible methods are devised to discover the SCOM of a document. Some experiments on artificial documents and Understanding Maps are conducted to test their outcomes. In addition, its ability of vectorization of documents and capturing sequential information are tested. We also compared UM-S-TM with probabilistic topic models like Latent Dirichlet Allocation (LDA) and probabilistic Latent Semantic Analysis (pLSA).
翻译:受物理学质量中心概念的启发,提出了称为“质量质量语义中心(SCOM)”的扩展,用于发现文件的抽象“主题”。这个概念在称为“理解地图监督主题模型(UM-S-TM)”的框架模型下。UM-S-TM的设计目标是让文件内容和一个语义网络(具体地说,理解地图)发挥作用,解释文件的含义。根据不同的理由,设计了三种可能的方法来发现文件的 SCOM。对人工文件和理解地图进行了一些实验,以测试其结果。此外,还测试了它的文件矢量化和捕捉顺序信息的能力。我们还将UM-S-TM与“Lenttent Drichlet分配”(LDA)和“概率性Lent Semantic 分析(PLSA)”等概率性专题模型进行了比较。