被推论的人格评估与传统人格评估:我们是否预测了同样的事情? (Inferred vs traditional personality assessment: are we predicting the same thing?)

Machine learning methods are widely used by researchers to predict psychological characteristics from digital records. To find out whether automatic personality estimates retain the properties of the original traits, we reviewed 220 recent articles. First, we put together the predictive quality estimates from a subset of the studies which declare separation of training, validation, and testing phases, which is critical for ensuring the correctness of quality estimates in machine learning. Only 20% of the reviewed papers met this criterion. To compare the reported quality estimates, we converted them to approximate Pearson correlations. The credible upper limits for correlations between predicted and self-reported personality traits vary in a range between 0.42 and 0.48, depending on the specific trait. The achieved values are substantially below the correlations between traits measured with distinct self-report questionnaires. This suggests that we cannot readily interpret personality predictions as estimates of the original traits or expect predicted personality traits to reproduce known relationships with life outcomes regularly. Next, we complement quality estimates evaluation with evidence on psychometric properties of predicted traits. The few existing results suggest that predicted traits are less stable with time and have lower effective dimensionality than self-reported personality. The predictive text-based models perform substantially worse outside their training domains but stay above a random baseline. The evidence on the relationships between predicted traits and external variables is mixed. Predictive features are difficult to use for validation, due to the lack of prior hypotheses. Thus, predicted personality traits fail to retain important properties of the original characteristics. This calls for the cautious use and targeted validation of the predictive models.

翻译：研究人员广泛使用机器学习方法来从数字记录中预测心理特征。为了查明自动人格估计是否保留原始特征的属性,我们审查了220个最近的文章。首先,我们从宣布培训、验证和测试阶段分离的一组研究中汇集了预测质量估计,这对于确保机器学习质量估计的正确性至关重要。只有20%的经审查的论文符合这一标准。为了比较所报告的质量估计,我们将其转换为近似皮尔逊的关联性。根据具体特征,预测和自我报告的个性特征之间的相关性的可信上限在0.42和0.48之间不等。实现的数值大大低于以不同的自我报告调查表衡量的特征之间的关联性。这表明我们无法轻易地将人格预测解释为原始特征的估计数,或预期与生活成果的关系定期复制的预测性特征。我们用质量估计来补充质量评估,用预测性特征的心理特征的特征证据。很少有现有结果表明预测性特征与时间的稳定性差,而且比自我报告的个性特征要低。预测性预测性的预测性能模型比自我报告性能的特性要差得多。预测性预测性预测性预测性能模型比之前的预测性预测性预测性预测性能的准确性能的特征要好得多,用来评估性能模型更难。