With the popularity of the Wordle game launched by the New York Times, more and more players are getting involved in this challenging game. Submitting the correct answer not only requires your luck, but is also influenced by various attributes of the word. For question 1, we preprocessed the original data by removing and replacing the abnormal data firstly. Then, we established an ARIMA-based prediction model for the number of reported results, with the parameters p=0, d=1, q=1 of the model determined. And it gave [20337, 21673] as the prediction interval for the number of reported results on March 1, 2023. Then we selected the frequency of word usage (FREQ), the information entropy of the word (WIE) and the number of repeated letters contained in the word (NRE) as attributes of the word and we made correlation analysis between these three attributes and seven percentages of tries. The results showed that FREQ was positively correlated with the number of tries, while WIE and NRE were negatively correlated with the number of tries. For problem 2, we established a regression model based on XGBoost algorithm for predicting the distribution of the reported results, and the three attributes selected for problem 1 were used to establish seven regression models for seven different tries, named XGB1 - XGB7. Since the percentage of 1 try was small and XGB1's prediction effect was poor, so we use the mean value of 1 try data - 0.5 as the prediction value of XGB1, and XGB2 - XGB7 model predicted 85.67%, 83.23%, 80.34%, 78.77%, 79.89%, and 84.63% respectively, with an overall accuracy of 82.1%. The associated percentages of (1, 2, 3, 4, 5, 6, X) of "EERIE" was predicted to be 0.5, 2.3, 13.8, 21.7, 29.4, 22.3 and 10. Due to length limitations, we will not continue to display more content.
翻译:暂无翻译