Personality recognition from text is typically cast as hard-label classification, which obscures the graded, prototype-like nature of human personality judgments. We present ProtoMBTI, a cognitively aligned framework for MBTI inference that operationalizes prototype theory within an LLM-based pipeline. First, we construct a balanced, quality-controlled corpus via LLM-guided multi-dimensional augmentation (semantic, linguistic, sentiment). Next, we LoRA-fine-tune a lightweight (<=2B) encoder to learn discriminative embeddings and to standardize a bank of personality prototypes. At inference, we retrieve top-k prototypes for a query post and perform a retrieve--reuse--revise--retain cycle: the model aggregates prototype evidence via prompt-based voting, revises when inconsistencies arise, and, upon correct prediction, retains the sample to continually enrich the prototype library. Across Kaggle and Pandora benchmarks, ProtoMBTI improves over baselines on both the four MBTI dichotomies and the full 16-type task, and exhibits robust cross-dataset generalization. Our results indicate that aligning the inference process with psychological prototype reasoning yields gains in accuracy, interpretability, and transfer for text-based personality modeling.
翻译:基于文本的人格识别通常被视作硬标签分类问题,这掩盖了人类人格判断所具有的渐变性及类原型的本质。本文提出ProtoMBTI——一个认知对齐的MBTI推断框架,该框架将原型理论操作化于基于大语言模型(LLM)的流程中。首先,我们通过LLM引导的多维度增强(语义、语言、情感)构建了一个平衡且质量可控的语料库。接着,我们使用LoRA对轻量级(≤2B)编码器进行微调,以学习判别性嵌入并标准化人格原型库。在推断阶段,我们为查询帖子检索top-k原型,并执行检索-复用-修正-保留循环:模型通过基于提示的投票聚合原型证据,在出现不一致时进行修正,并在预测正确时保留样本以持续丰富原型库。在Kaggle和Pandora基准测试中,ProtoMBTI在四项MBTI二分维度及完整的16型人格任务上均优于基线模型,并展现出强大的跨数据集泛化能力。我们的结果表明,将推断过程与心理原型推理相对齐,能够提升基于文本的人格建模在准确性、可解释性及迁移性方面的表现。