The TCPD-IPD dataset is a collection of questions and answers discussed in the Lower House of the Parliament of India during the Question Hour between 1999 and 2019. Although it is difficult to analyze such a huge collection manually, modern text analysis tools can provide a powerful means to navigate it. In this paper, we perform an exploratory analysis of the dataset. In particular, we present insightful corpus-level statistics and a detailed analysis of three subsets of the dataset. In the latter analysis, the focus is on understanding the temporal evolution of topics using a dynamic topic model. We observe that the parliamentary conversation indeed mirrors the political and socio-economic tensions of each period.
翻译:本文介绍了TCPD-IPD数据集,该数据集展示了1999年至2019年期间印度下议院在“议题时间”中讨论的问题和答案。虽然手动分析如此大量的数据集是困难的,但现代文本分析工具可以提供强大的手段来处理此类数据集。在本文中,我们对数据集进行了探索性分析。特别是,我们提供了见解深入的整体统计数据,并对数据集的三个子集进行了详细分析。在后者分析中,我们重点关注使用动态主题模型了解话题的时间演变。我们发现,议会对话确实反映了每个时期的政治和社会经济紧张局势。