Scaling methods have long been utilized to simplify and cluster high-dimensional data. However, the latent spaces derived from these methods are sometimes uninformative or unable to identify significant differences in the data. To tackle this common issue, we adopt an emerging analysis approach called contrastive learning. We contribute to this emerging field by extending its ideas to multiple correspondence analysis (MCA) in order to enable an analysis of data often encountered by social scientists -- namely binary, ordinal, and nominal variables. We demonstrate the utility of contrastive MCA (cMCA) by analyzing three different surveys of voters in Europe, Japan, and the United States. Our results suggest that, first, cMCA can identify substantively important dimensions and divisions among (sub)groups that are overlooked by traditional methods; second, for certain cases, cMCA can still derive latent traits that generalize across and apply to multiple groups in the dataset; finally, when data is high-dimensional and unstructured, cMCA provides objective heuristics, above and beyond the standard results, enabling more complex subgroup analysis.
翻译:长期以来,一直使用定级法来简化和集中高维数据,但从这些方法中得出的潜在空间有时缺乏信息,或无法查明数据的重大差异。为了解决这一共同问题,我们采取了一种新兴的分析方法,称为对比性学习。我们为这个新兴领域作出贡献,将其想法扩大到多个通信分析(MCA),以便能够分析社会科学家经常遇到的数据 -- -- 即二进制、交点和名义变量。我们通过分析欧洲、日本和美国选民的三次不同调查,显示了对比性MCA(cMCA)的效用。我们的结果表明,首先,CMCA可以确定被传统方法忽略的(子)群体中非常重要的层面和差异;第二,对于某些情况,cMCA仍然可以产生潜在的特征,在数据集中泛化并适用于多个群体;最后,当数据是高维和非结构变量时,CMCA提供客观的超标准结果,从而能够进行更复杂的分组分析。