Popular methods for modeling data both labelled and unlabeled, multiple regression and PCA has been used in research for a vast number of datasets. In this investigation, we attempt to push the limits of these two methods by running a fit on world development data, a set notorious for its complexity and high dimensionality. We assess the robustness and numerical stability of both methods using their matrix condition number and ability to capture variance in the dataset. The result indicates poor performance from both methods from a numerical standpoint, yet certain qualitative insights can still be captured.
翻译:在对大量数据集进行研究时,使用了标记和未标记、多重回归和五氯苯甲醚的通用数据模型模型方法。在这次调查中,我们试图通过调适世界发展数据来推开这两种方法的极限,这套数据因其复杂性和高度多维性而臭名昭著。我们用其矩阵条件编号和捕捉数据集差异的能力来评估这两种方法的稳健性和数字稳定性。结果显示,从数字角度看,这两种方法的性能都很差,但某些定性的洞察力仍然可以捕捉到。