In recent years, the usage of ensemble learning in applications has grown significantly due to increasing computational power allowing the training of large ensembles in reasonable time frames. Many applications, e.g., malware detection, face recognition, or financial decision-making, use a finite set of learning algorithms and do aggregate them in a way that a better predictive performance is obtained than any other of the individual learning algorithms. In the field of Post-Silicon Validation for semiconductor devices (PSV), data sets are typically provided that consist of various devices like, e.g., chips of different manufacturing lines. In PSV, the task is to approximate the underlying function of the data with multiple learning algorithms, each trained on a device-specific subset, instead of improving the performance of arbitrary classifiers on the entire data set. Furthermore, the expectation is that an unknown number of subsets describe functions showing very different characteristics. Corresponding ensemble members, which are called outliers, can heavily influence the approximation. Our method aims to find a suitable approximation that is robust to outliers and represents the best or worst case in a way that will apply to as many types as possible. A 'soft-max' or 'soft-min' function is used in place of a maximum or minimum operator. A Neural Network (NN) is trained to learn this 'soft-function' in a two-stage process. First, we select a subset of ensemble members that is representative of the best or worst case. Second, we combine these members and define a weighting that uses the properties of the Local Outlier Factor (LOF) to increase the influence of non-outliers and to decrease outliers. The weighting ensures robustness to outliers and makes sure that approximations are suitable for most types.
翻译:近些年来,应用中混合学习的使用有了显著的增加,这是因为计算能力增加,使得能够在合理的时间框架内培训大型组合。许多应用,例如恶意软件检测、面部识别或财务决策等,使用一套有限的学习算法,并把它们集中起来,以便获得比其他任何单个学习算法更好的预测性能。在半导体设备(PSV)的后硅校验领域,通常提供由各种装置组成的数据集,例如不同制造线的芯片。在 PSV中,任务在于用多种学习算法来估测数据的基本功能,每个都经过设备特定子集的培训,使用一套有限的学习算法,而不是提高整个数据集中任意分类人的性能。在半导导体校校校校校外成员中,最精确的一组成员(被称作异端,最精确的成员)可以大大地影响近似。我们的方法旨在找到一种合适的近似于离值的离值,并且能代表最高或最坏的运行方的运行者使用一个最精确的直径直值。在最接近或最坏的运行者中,我们最接近的直位的直径直系或最精确的直径的内,可以用来用来将一个最精确的内或最精确的机组成员用来将一个最精确的状态用于最精确的直系。