Various unsupervised greedy selection methods have been proposed as computationally tractable approximations to the NP-hard subset selection problem. These methods rely on sequentially selecting the variables that best improve performance with respect to a selection criterion. Theoretical results exist that provide performance bounds and enable "lazy greedy" efficient implementations for selection criteria that satisfy a diminishing returns property known as submodularity. This has motivated the development of variable selection algorithms based on mutual information and frame potential. Recently, the authors introduced Forward Selection Component Analysis (FSCA) which uses variance explained as its selection criterion. While this criterion is not submodular, FSCA has been shown to be highly effective for applications such as measurement plan optimisation. In this paper a "lazy" implementation of the FSCA algorithm (L-FSCA) is proposed, which, although not equivalent to FSCA due to the absence of submodularity, has the potential to yield comparable performance while being up to an order of magnitude faster to compute. The efficacy of L-FSCA is demonstrated by performing a systematic comparison with FSCA and five other unsupervised variable selection methods from the literature using simulated and real-world case studies. Experimental results confirm that L-FSCA yields almost identical performance to FSCA while reducing computation time by between 22% and 94% for the case studies considered.
翻译:提出了各种未经监督的贪婪选择方法,作为NP-硬子选择问题的可计算性近似近似点。这些方法依靠按顺序选择在选择标准方面最能提高绩效的变量。 理论结果存在,提供了性能约束,使“贪婪”高效地实施符合不断减少的回报属性的筛选标准,称为亚模式。 这促使根据相互信息和框架潜力制定不同的选择算法。 最近,作者采用了以差异解释为选择标准的前瞻性选择部分分析法(FSCA),该分析法使用差异作为其选择标准。虽然这一标准不是子模块,但FSCA已证明对测量计划优化等应用非常有效。在本文件中,提出了“懒”实施FSCA算法(L-FSCA),该算法虽然由于没有亚模式,并不等同于FSCA,但有可能产生可比的性能,同时以快速度排序速度更快的速度计算。 L-FSCA与5个未受监督的变量选择方法进行了系统比较,而对于测量计划优化计划(L-FSCA)的优化等应用。 本文中,提出了一种“懒”执行法方法,通过模拟和精确的测试,通过测试了22个测试的州SAS-FCA-CA-CA-CA-CA-CA-CA 测试的测试的测试的测试的测试的测试,从而验证了测试了94个测试结果的测试结果的测试-CA-CA-CA-CA-CA-CA-CA-CA-CA-CA-CAxxxx