In supervised classification problems, the test set may contain data points belonging to classes not observed in the learning phase. Moreover, the same units in the test data may be measured on a set of additional variables recorded at a subsequent stage with respect to when the learning sample was collected. In this situation, the classifier built in the learning phase needs to adapt to handle potential unknown classes and the extra dimensions. We introduce a model-based discriminant approach, Dimension-Adaptive Mixture Discriminant Analysis (D-AMDA), which can detect unobserved classes and adapt to the increasing dimensionality. Model estimation is carried out via a full inductive approach based on an EM algorithm. The method is then embedded in a more general framework for adaptive variable selection and classification suitable for data of large dimensions. A simulation study and an artificial experiment related to classification of adulterated honey samples are used to validate the ability of the proposed framework to deal with complex situations.
翻译:在受监督的分类问题中,测试组可能包含属于学习阶段未观察到的类别的数据点;此外,测试数据中的相同单位可以测量在随后阶段收集学习样本时所记录的一组额外变量;在这种情况下,在学习阶段所建的分类器需要适应处理潜在的未知类别和额外维度;我们采用了基于模型的分离法,即多维分辨分析(D-AMDA),该方法可以检测未观测的类别,并适应日益增强的维度;模型估算是通过基于EM算法的全面吸收方法进行的;然后,该方法被嵌入一个适合大维量数据的适应性变量选择和分类的更一般性框架;使用模拟研究和与通配制蜂类样品分类有关的人工实验,以验证拟议框架处理复杂情况的能力。