In Big Data environment, one pressing challenge facing engineers is to perform reliability analysis for a large fleet of heterogeneous repairable systems with covariates. In addition to static covariates, which include time-invariant system attributes such as nominal operating conditions, geo-locations, etc., the recent advances of sensing technologies have also made it possible to obtain dynamic sensor measurement of system operating and environmental conditions. As a common practice in the Big Data environment, the massive reliability data are typically stored in some distributed storage systems. Leveraging the power of modern statistical learning, this paper investigates a statistical approach which integrates the Random Forests algorithm and the classical data analysis methodologies for repairable system reliability, such as the nonparametric estimator for the Mean Cumulative Function and the parametric models based on the Nonhomogeneous Poisson Process. We show that the proposed approach effectively addresses some common challenges arising from practice, including system heterogeneity, covariate selection, model specification and data locality due to the distributed data storage. The large sample properties as well as the uniform consistency of the proposed estimator is established. Two numerical examples and a case study are presented to illustrate the application of the proposed approach. The strengths of the proposed approach are demonstrated by comparison studies.
翻译:在大数据环境中,工程师面临的一项紧迫挑战是,对庞大的多种可修复的系统以及同源体进行可靠性分析。除了静态的共变式外,其中包括时间变化的系统属性,如名义运行条件、地理位置等,遥感技术的最近进展也使得有可能对系统运行和环境条件进行动态传感器测量。作为大数据环境中的常见做法,大量可靠性数据通常储存在一些分布式储存系统中。利用现代统计学习的力量,本文调查一种统计方法,其中结合随机森林算法和用于可修复系统可靠性的典型数据分析方法,例如平均累积功能的非参数性估测仪和基于非对数皮松森过程的准度模型。我们表明,拟议方法有效地解决了因实践而产生的一些共同挑战,包括系统异质性、易变选择、模型规格和由于分布式数据存储而导致的数据位置。大量样本特性以及拟议估算器的统一一致性,已经确立。两个数字实例和案例研究展示了拟议方法的优势。通过比较方法展示了拟议应用情况。