In this study, a new R package, \texttt{rethnicity} is provided for predicting ethnicity based on names. The Bidirectional LSTM and Florida Voter Registration were used as the model and training data, respectively. Special care was given for the accuracy of minority groups, by adjusting the imbalance in the dataset. The models were trained and exported to C++ and then integrated with R using Rcpp. Additionally, the availability, accuracy, and performance of the package were compared with other solutions.
翻译:这项研究提供了一个新的R包件,即\textt{r种族性},用于根据姓名预测族裔。双向LSTM和Florida选民登记分别用作模型和培训数据。通过调整数据集中的不平衡,特别注意少数群体的准确性。模型经过培训后被出口到C++,然后与R使用Rcpp进行整合。此外,还比较了该包件的可用性、准确性和性能与其他解决方案进行比较。