AI data centers which are GPU centric, have adopted liquid cooling to handle extreme heat loads, but coolant leaks result in substantial energy loss through unplanned shutdowns and extended repair periods. We present a proof-of-concept smart IoT monitoring system combining LSTM neural networks for probabilistic leak forecasting with Random Forest classifiers for instant detection. Testing on synthetic data aligned with ASHRAE 2021 standards, our approach achieves 96.5% detection accuracy and 87% forecasting accuracy at 90% probability within plus or minus 30-minute windows. Analysis demonstrates that humidity, pressure, and flow rate deliver strong predictive signals, while temperature exhibits minimal immediate response due to thermal inertia in server hardware. The system employs MQTT streaming, InfluxDB storage, and Streamlit dashboards, forecasting leaks 2-4 hours ahead while identifying sudden events within 1 minute. For a typical 47-rack facility, this approach could prevent roughly 1,500 kWh annual energy waste through proactive maintenance rather than reactive emergency procedures. While validation remains synthetic-only, results establish feasibility for future operational deployment in sustainable data center operations.
翻译:以GPU为核心的人工智能数据中心已采用液冷技术应对极端热负荷,但冷却剂泄漏会因计划外停机与维修周期延长导致大量能源损耗。本研究提出一种概念验证型智能物联网监测系统,其结合了用于概率性泄漏预测的LSTM神经网络与用于即时检测的随机森林分类器。基于符合ASHRAE 2021标准的合成数据测试表明,该方法在±30分钟时间窗内达到96.5%的检测准确率,并在90%概率下实现87%的预测准确率。分析显示湿度、压力与流速能提供强预测信号,而温度因服务器硬件热惯性未呈现显著即时响应。该系统采用MQTT流传输、InfluxDB存储与Streamlit仪表板,可提前2-4小时预测泄漏事件,并在1分钟内识别突发泄漏。对于典型的47机架设施,该方法可通过主动维护(而非被动应急处理)每年避免约1,500千瓦时的能源浪费。虽然当前验证仅基于合成数据,但研究结果为未来可持续数据中心运营中的实际部署奠定了可行性基础。