A data-centric approach with Natural Language Processing (NLP) to predict personality types based on the MBTI (an introspective self-assessment questionnaire that indicates different psychological preferences about how people perceive the world and make decisions) through systematic enrichment of text representation, based on the domain of the area, under the generation of features based on three types of analysis: sentimental, grammatical and aspects. The experimentation had a robust baseline of stacked models, with premature optimization of hyperparameters through grid search, with gradual feedback, for each of the four classifiers (dichotomies) of MBTI. The results showed that attention to the data iteration loop focused on quality, explanatory power and representativeness for the abstraction of more relevant/important resources for the studied phenomenon made it possible to improve the evaluation metrics results more quickly and less costly than complex models such as the LSTM or state of the art ones as BERT, as well as the importance of these results by comparisons made from various perspectives. In addition, the study demonstrated a broad spectrum for the evolution and deepening of the task and possible approaches for a greater extension of the abstraction of personality types.
翻译:与自然语言处理公司(NLP)合作,以数据为中心的方法,预测基于MBTI的个性类型(自评问卷,显示人们如何看待世界和作出决定的不同心理偏好),根据地区领域,根据三种分析类型(情感、语法和方面)生成的特征,系统地丰富文本代表,根据以下三种分析类型(情感、语法和方面),对MBTI的4个分类者(直径)进行预测,通过网格搜索和逐步反馈,对超光谱仪进行过早优化。结果显示,通过关注数据循环,注重质量、解释力和代表性,为所研究的现象抽取更相关/重要的资源,使得能够更快、更便宜地改进评价指标结果,而不是复杂模型,如LSTM或作为BERT的艺术状态,以及从各种角度进行比较,这些结果的重要性。此外,研究还表明,任务演化和深化的广泛范围,以及扩大抽象人格种类的可能方法。