In this paper, we analyzed the dataset IBM Employee Attrition to find the main reasons why employees choose to resign. Firstly, we utilized the correlation matrix to see some features that were not significantly correlated with other attributes and removed them from our dataset. Secondly, we selected important features by exploiting Random Forest, finding monthlyincome, age, and the number of companies worked significantly impacted employee attrition. Next, we also classified people into two clusters by using K-means Clustering. Finally, We performed binary logistic regression quantitative analysis: the attrition of people who traveled frequently was 2.4 times higher than that of people who rarely traveled. And we also found that employees who work in Human Resource have a higher tendency to leave.
翻译:在本文中,我们分析了IBM雇员减员数据库,以找出雇员选择辞职的主要原因。首先,我们利用相关矩阵来查看一些与其他属性没有重大关联的特征,并将这些特征从我们的数据集中删除。第二,我们通过开发随机森林、寻找月收入、年龄和从事大量工作的公司来选择重要特征。接着,我们还利用K手段集群将人分为两组。最后,我们进行了二进制物流退缩定量分析:经常出行的人的减员率比很少出行的人高出2.4倍。我们还发现,在人力资源部门工作的雇员更倾向于离开。