Ordinal data occur frequently in the social sciences. When applying principal component analysis (PCA), however, those data are often treated as numeric implying linear relationships between the variables at hand, or non-linear PCA is applied where the obtained quantifications are sometimes hard to interpret. Non-linear PCA for categorical data, also called optimal scoring/scaling, constructs new variables by assigning numerical values to categories such that the proportion of variance in those new variables that is explained by a predefined number of principal components is maximized. We propose a penalized version of non-linear PCA for ordinal variables that is a smoothed intermediate between standard PCA on category labels and non-linear PCA as used so far. The new approach is by no means limited to monotonic effects and offers both better interpretability of the non-linear transformation of the category labels as well as better performance on validation data than unpenalized non-linear PCA and/or standard linear PCA. In particular, an application of penalized optimal scaling to ordinal data as given with the International Classification of Functioning, Disability and Health (ICF) is provided.
翻译:在应用主要组成部分分析(PCA)时,这些数据往往被视为数字式的表示手头变量之间的线性关系,或者在获得的量化有时难以解释的情况下使用非线性五氯苯甲醚。非线性五氯苯甲醚用于绝对数据,也称为最佳评分/缩放,通过给不同类别分配数字值来构建新的变量,使以预定主要组成部分数量解释的新变量的差异比例最大化。我们建议了非线性五氯苯甲醚的处罚版本,用于分类标签上的标准五氯苯甲醚和非线性五氯苯甲醚之间平滑的中间变量。新的方法绝不局限于单体效应,而是提供较佳的分类非线性变换的可解释性,以及比非线性非线性五氯苯甲醚和/或标准线性五氯苯甲醚在验证数据上的更好性表现。我们特别建议了对国际功能、残疾和健康分类(ICF)中给出的正性数据进行最优度缩放。