Accurately predicting sports viewership is crucial for optimizing ad sales and revenue forecasting. Social media platforms, such as Reddit, provide a wealth of user-generated content that reflects audience engagement and interest. In this study, we propose a regression-based approach to predict sports viewership using social media metrics, including post counts, comments, scores, and sentiment analysis from TextBlob and VADER. Through iterative improvements, such as focusing on major sports subreddits, incorporating categorical features, and handling outliers by sport, the model achieved an $R^2$ of 0.99, a Mean Absolute Error (MAE) of 1.27 million viewers, and a Root Mean Squared Error (RMSE) of 2.33 million viewers on the full dataset. These results demonstrate the model's ability to accurately capture patterns in audience behavior, offering significant potential for pre-event revenue forecasting and targeted advertising strategies.
翻译:暂无翻译