Statistical samples, in order to be representative, have to be drawn from a population in a random and unbiased way. Nevertheless, it is common practice in the field of model-based diagnosis to make estimations from (biased) best-first samples. One example is the computation of a few most probable possible fault explanations for a defective system and the use of these to assess which aspect of the system, if measured, would bring the highest information gain. In this work, we scrutinize whether these statistically not well-founded conventions, that both diagnosis researchers and practitioners have adhered to for decades, are indeed reasonable. To this end, we empirically analyze various sampling methods that generate fault explanations. We study the representativeness of the produced samples in terms of their estimations about fault explanations and how well they guide diagnostic decisions, and we investigate the impact of sample size, the optimal trade-off between sampling efficiency and effectivity, and how approximate sampling techniques compare to exact ones.
翻译:为了具有代表性,必须以随机和不偏不倚的方式从人口中抽取统计样本,然而,在基于模型的诊断领域,通常的做法是对(偏差的)最佳样本作出估计,其中一个例子是,计算出对系统缺陷的少数最可能可能的错误解释,并使用这些解释来评估系统缺陷的哪些方面,如果衡量的话,将带来最大的信息收益。在这项工作中,我们仔细检查这些统计上不完全依据的公约,即诊断研究人员和从业人员几十年来一直坚持的公约,是否确实合理。为此,我们用经验分析产生错误解释的各种抽样方法。我们研究所生产的样品的代表性,从它们对缺陷解释的估计以及它们如何很好地指导诊断性决定,我们调查抽样规模的影响、抽样效率和效果之间的最佳权衡,以及抽样技术与确切技术的近似比较。