In the big-data age tabular datasets are being generated and analyzed everywhere. As a consequence, finding and understanding the relationships between the features of these datasets are of great relevance. Here, to encompass these relationships we propose a methodology that maps an entire tabular dataset or just an observation into a weighted directed graph using the Shapley additive explanations technique. With this graph of relationships, we show that the inference of the hierarchical modular structure obtained by the nested stochastic block model (nSBM) as well as the study of the spectral space of the magnetic Laplacian can help us identify the classes of features and unravel non-trivial relationships. As a case study, we analyzed a socioeconomic survey conducted with students in Brazil: the PeNSE survey. The spectral embedding of the columns suggested that questions related to physical activities form a separate group. The application of the nSBM approach, corroborated with that and allowed complementary findings about the modular structure: some groups of questions showed a high adherence with the divisions qualitatively defined by the designers of the survey. However, questions from the class \textit{Safety} were partly grouped by our method in the class \textit{Drugs}. Surprisingly, by inspecting these questions, we observed that they were related to both these topics, suggesting an alternative interpretation of these questions. Our method can provide guidance for tabular data analysis as well as the design of future surveys.
翻译:在大数据时代的表格数据集中,各地都在生成和分析这些数据。因此,发现和理解这些数据集特征之间的关系具有极大的相关性。为了纳入这些关系,我们提议一种方法,利用沙普利添加解释技术绘制一个完整的表格数据集,或只是将观察结果纳入加权定向图表。我们用这个关系图显示,嵌套的随机区块模型(nSBM)获得的等级模块结构的推论以及磁拉普拉钱的光谱空间研究,可以帮助我们确定特征的类别和分解非三角关系。作为案例研究,我们分析了与巴西学生进行的社会经济调查:PENSE调查。各列的光谱嵌入表明,与物理活动有关的问题构成一个单独的组。采用nSBM方法,并以此加以证实,并允许对模块结构进行补充性结论:有些问题组显示了对调查设计师所定义的定性差异的高度认同度。然而,从课堂的Textitit {Safritical reviewrial redustrations, 部分地显示,我们用这些图表来进行这些分析。