We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.
翻译:我们建议采用K-S-K全面搜索(ES-K)方法和K-strase详尽搜索方法来选择线性回归中的变量。使用这些方法,对K-strase组合变量进行彻底测试,假设解释变量的最佳组合是K-sparse。通过收集详尽计算ES-K的结果,可以将选择稀有变量的各种近似方法归纳为国家的密度。有了这种密度,我们可以比较选择诸如放松和取样等稀有变量的不同方法。对于解释变量的组合爆炸至关重要的重大问题,AES-K方法使国家密度能够通过使用复制交换蒙特卡洛方法和多直方图方法进行有效重建。应用ES-K和AES-K方法输入Ia超新星数据,我们确认了在事先给出适当的K时对天文学的常规理解。然而,我们发现从数据中确定 K的困难。使用虚拟测量和分析,我们说这是数据短缺造成的。