This paper proposes a new methodology to study sequential corpora by implementing a two-stage algorithm that learns time-based topics with respect to a scale of document positions and introduces the concept of Topic Scaling which ranks learned topics within the same document scale. The first stage ranks documents using Wordfish, a Poisson-based document scaling method, to estimate document positions that serve, in the second stage, as a dependent variable to learn relevant topics via a supervised Latent Dirichlet Allocation. This novelty brings two innovations in text mining as it explains document positions, whose scale is a latent variable, and ranks the inferred topics on the document scale to match their occurrences within the corpus and track their evolution. Tested on the U.S. State Of The Union two-party addresses, this inductive approach reveals that each party dominates one end of the learned scale with interchangeable transitions that follow the parties' term of office. Besides a demonstrated high accuracy in predicting in-sample documents' positions from topic scores, this method reveals further hidden topics that differentiate similar documents by increasing the number of learned topics to unfold potential nested hierarchical topic structures. Compared to other popular topic models, Topic Scaling learns topics with respect to document similarities without specifying a time frequency to learn topic evolution, thus capturing broader topic patterns than dynamic topic models and yielding more interpretable outputs than a plain latent Dirichlet allocation.
翻译:本文提出一种新的方法来研究相继公司,通过实施一个两阶段算法来学习与文件立场规模有关的基于时间的专题,并引入“专题缩放”的概念,将学到的题目排在同一文件规模内。第一阶段的文件排名采用Poisson文件缩放法Wordfish,用Poisson文件缩放法来估计文件立场的一端,在第二阶段,作为通过受监督的低端分散式分配来学习相关题目的可变因素。这一新颖方法在文本挖掘中带来了两个创新,因为它解释了文件位置,其规模是一个潜在的变数,并将文件规模的推导主题排在文件中,排在文件中,以匹配其内容,跟踪其演变演变过程。在美国测试了“联盟两党的状态”的演讲,这一推导式方法表明,每个政党在所学规模的一端以可互换的过渡方式,在缔约方任期之后,作为学习相关专题的依次变换变换变量。除了在预测抽样文件中的位置方面表现出高度的准确性外,这一方法还揭示了更隐蔽的题目,通过增加学习的题目来展示潜在的分级主题结构结构结构结构结构结构。 将一个比平凡地学习一个比平的变式文件模型,从而学习一个比较一个比较一个比较一个比较一个比较一个比较一个普通的变式的变式的变式的变式的变式的变式的变式的题目,将一个不同的题目,从而一个比一个比一个比较一个比较一个比较一个比较一个更深式的变式的变式的研制式的研制式的研制式的题目比一个比较一个较制式的题目比一个比较一个较制式的题目比一个比较一个比较一个比较一个比较一个比较一个比较一个比较一个比较一个比较一个比较一个比较一个比较一个变制式的研制式的研制式的研制式的研制式的题目。