An analysis of high dimensional data can offer a detailed description of a system but is often challenged by the curse of dimensionality. General dimensionality reduction techniques can alleviate such difficulty by extracting a few important features, but they are limited due to the lack of interpretability and connectivity to actual decision making associated with each physical variable. Important variable selection techniques, as an alternative, can maintain the interpretability, but they often involve a greedy search that is susceptible to failure in capturing important interactions. This research proposes a new method that produces subspaces, reduced-dimensional physical spaces, based on a randomized search and forms an ensemble of models for critical subspaces. When applied to high-dimensional data collected from a composite metal development process, the proposed method shows its superiority in prediction and important variable selection.
翻译:对高维数据的分析可以提供系统的详细描述,但往往受到维度诅咒的挑战。一般维度减少技术可以通过提取几个重要特征来减轻这种困难,但由于每个物理变量缺乏可解释性和与实际决策的连通性,因此这些技术有限。重要的变量选择技术可以保持可解释性,但通常涉及贪婪的搜索,在捕捉重要互动时很容易失败。这一研究提出了一种新的方法,在随机搜索的基础上产生子空间,减少维度物理空间,形成关键子空间的模型组合。当应用于从复合金属开发过程中收集的高维数据时,拟议方法显示其在预测和重要变量选择中的优势。