This is a case study, where Taxicab Correspondence Analysis reveals that the underlying structure of an extremely sparse binary textual data set can be represented by a binary tree, where the nodes representing clusters of words can be interpreted as topics. The textual data set represents Israel's Declaration of Independence text and 40 diverse Israeli Interviewees. The analysis provides for a compare and contrast study of textual data coming from two different sources. Furthermore, we propose an adjusted sparsity index which takes into account the size of the data table.
翻译:暂无翻译