This paper examines the use of risk models to predict the timing and location of wildfires caused by electricity infrastructure. Our data include historical ignition and wire-down points triggered by grid infrastructure collected between 2015 to 2019 in Pacific Gas & Electricity territory along with various weather, vegetation, and very high resolution data on grid infrastructure including location, age, materials. With these data we explore a range of machine learning methods and strategies to manage training data imbalance. The best area under the receiver operating characteristic we obtain is 0.776 for distribution feeder ignitions and 0.824 for transmission line wire-down events, both using the histogram-based gradient boosting tree algorithm (HGB) with under-sampling. We then use these models to identify which information provides the most predictive value. After line length, we find that weather and vegetation features dominate the list of top important features for ignition or wire-down risk. Distribution ignition models show more dependence on slow-varying vegetation variables such as burn index, energy release content, and tree height, whereas transmission wire-down models rely more on primary weather variables such as wind speed and precipitation. These results point to the importance of improved vegetation modeling for feeder ignition risk models, and improved weather forecasting for transmission wire-down models. We observe that infrastructure features make small but meaningful improvements to risk model predictive power.
翻译:本文审查了使用风险模型预测电力基础设施引起的野火的时间和地点的情况。我们的数据包括:2015年至2019年太平洋天然气和电力区收集的电网基础设施所触发的历史点火和电线点,以及各种天气、植被和关于电网基础设施的甚高分辨率数据,包括地点、年龄和材料。我们利用这些数据探索了一系列管理培训数据不平衡的机器学习方法和战略。我们获得的接收器操作特征下的最佳区域是分配源源点点火0.776和传输线线下线事件0.824,而传输线线下线事件则使用基于直方图的梯度加速树算法(HGB),同时取样不足。我们随后使用这些模型来确定哪些信息能提供最有预测价值。在线长度之后,我们发现天气和植被特征占了点火或线下风险最重要的特征清单。分发点火模型显示,我们更多地依赖缓慢变化的植被变量,如燃烧指数、释放内容和树高,而传输线下线模式则更多地依赖风速和降水等主要天气变量。这些结果显示,改进的天气模型对于改进后的电路路况预测至关重要。我们进行可靠的预测模型,以便进行有意义的预测。