Classification models are a fundamental component of physical-asset management technologies such as structural health monitoring (SHM) systems and digital twins. Previous work introduced \textit{risk-based active learning}, an online approach for the development of statistical classifiers that takes into account the decision-support context in which they are applied. Decision-making is considered by preferentially querying data labels according to \textit{expected value of perfect information} (EVPI). Although several benefits are gained by adopting a risk-based active learning approach, including improved decision-making performance, the algorithms suffer from issues relating to sampling bias as a result of the guided querying process. This sampling bias ultimately manifests as a decline in decision-making performance during the later stages of active learning, which in turn corresponds to lost resource/utility. The current paper proposes two novel approaches to counteract the effects of sampling bias: \textit{semi-supervised learning}, and \textit{discriminative classification models}. These approaches are first visualised using a synthetic dataset, then subsequently applied to an experimental case study, specifically, the Z24 Bridge dataset. The semi-supervised learning approach is shown to have variable performance; with robustness to sampling bias dependent on the suitability of the generative distributions selected for the model with respect to each dataset. In contrast, the discriminative classifiers are shown to have excellent robustness to the effects of sampling bias. Moreover, it was found that the number of inspections made during a monitoring campaign, and therefore resource expenditure, could be reduced with the careful selection of the statistical classifiers used within a decision-supporting monitoring system.
翻译:分类模型是物理资产管理技术的基本组成部分,例如结构健康监测(SHM)系统和数字双胞胎。先前的工作引入了\textit{风险基础积极学习},这是统计分类员开发的在线方法,考虑到应用这些分类员的决策支持环境。决策是通过优先查询数据标签来考虑的。尽管采用基于风险的积极学习方法,包括改进决策性能,使算法获益于与抽样偏差有关的问题,这是引导查询过程的结果。这种抽样偏差最终表现为在后期积极学习阶段决策业绩的下降,这反过来相当于资源/用途的丧失。 本文提出了两种新颖的方法来抵消抽样偏差的影响:\textit{偏差的学习}和 textitit{偏差的分类模型。 这些方法首先采用合成数据集,随后在试验性的案例研究中应用,具体来说,在采用精确的精确的精确度评估方法后,在采用精细的精细的精度评估过程中,在采用精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精度数据分析方法。