We consider the task of feature selection for reconstruction which consists in choosing a small subset of features from which whole data instances can be reconstructed. This is of particular importance in several contexts involving for example costly physical measurements, sensor placement or information compression. To break the intrinsic combinatorial nature of this problem, we formulate the task as optimizing a binary mask distribution enabling an accurate reconstruction. We then face two main challenges. One concerns differentiability issues due to the binary distribution. The second one corresponds to the elimination of redundant information by selecting variables in a correlated fashion which requires modeling the covariance of the binary distribution. We address both issues by introducing a relaxation of the problem via a novel reparameterization of the logitNormal distribution. We demonstrate that the proposed method provides an effective exploration scheme and leads to efficient feature selection for reconstruction through evaluation on several high dimensional image benchmarks. We show that the method leverages the intrinsic geometry of the data, facilitating reconstruction.
翻译:我们认为,重建的特征选择任务在于选择一小部分特征,从中可以重建整个数据实例。这在涉及费用高昂的物理测量、传感器布置或信息压缩等若干情况下特别重要。为了打破这一问题的内在组合性质,我们将任务设计为优化二元面罩分配,以便进行准确重建。然后我们面临两大挑战。一是二元分布造成的差异性问题。二是消除多余信息,方法是以相关方式选择变量,要求模拟二元分布的共变式。我们通过对对对正对热分布进行新式的重新量化来缓解问题。我们证明,拟议方法提供了有效的探索计划,并通过对几个高维图像基准进行评估,导致为重建高效地选择特征。我们表明,该方法利用了数据固有的几何测量方法,促进了重建。