Water resources serve as the cornerstone of human livelihoods and economic progress, with intrinsic links to both public health and environmental well-being. The accurate prediction of water quality stands as a pivotal factor in enhancing water resource management and combating pollution. This research, employing diverse performance metrics, assesses the efficacy of five distinct models, namely, linear regression, Random Forest, XGBoost, LightGBM, and MLP neural network, in forecasting pH values within Georgia, USA. Concurrently, LightGBM attains the highest average precision among all models examined. Tree-based models underscore their supremacy in addressing regression challenges. Furthermore, the performance of MLP neural network is sensitive to feature scaling. Additionally, we expound upon and dissect the reasons behind the superior precision of the machine learning models when they are compared to the original study, which factors in time dependencies and spatial considerations. The primary objective of this endeavor is to establish a robust predictive pipeline, specifically tailored for practical applications. It caters not only to individuals well-versed in the realm of data science but also to those lacking specialization in particular application domains. In essence, we offer a fresh perspective for achieving relative precision in data science methodologies, emphasizing both prediction accuracy and interpretability.
翻译:暂无翻译