Ordinal data occur frequently in the social sciences. When applying principal components analysis (PCA), however, those data are often treated as numeric implying linear relationships between the variables at hand, or non-linear PCA is applied where the obtained quantifications are sometimes hard to interpret. Non-linear PCA for categorical data, also called optimal scoring/scaling, constructs new variables by assigning numerical values to categories such that the proportion of variance in those new variables that is explained by a predefined number of principal components is maximized. We propose a penalized version of non-linear PCA for ordinal variables that is an intermediate between standard PCAon category labels and non-linear PCA as used so far. The new approach is by no means limited to monotonic effects, and offers both better interpretability of the non-linear transformation of the category labels as well as better performance on validation data than unpenalized non-linear PCA and/or standard linear PCA. In particular, an application of penalized non-linear PCA to ordinal data as given with the International Classification of Functioning, Disability and Health (ICF) is provided.
翻译:在应用主要组成部分分析(PCA)时,这些数据往往被视为数字式的表示手头变量之间的线性关系,或者在获得的量化有时难以解释的情况下使用非线性五氯苯甲醚。非线性五氯苯甲醚用于绝对数据,也称为最佳评分/缩放,通过给不同类别分配数字值来构建新的变量,使以预定主要组成部分数量解释的新变量的差异比例最大化。我们建议对标准五氯苯甲醚类别标签和非线性五氯苯甲醚之间迄今使用的中间变量采用非线性五氯苯甲醚的处罚版非线性非线性五氯苯甲醚。新的方法绝不局限于单体效应,而是为类别标签非线性转换提供更好的解释性,以及比非线性非线性非线性五氯苯甲醚和/或标准线性五氯苯甲醚在验证数据上的更好表现。我们特别建议对国际功能、残疾和健康分类(ICF)中给出的正统性非线性五氯苯甲醚数据适用非线性五氯苯甲醚。