Selecting subsets of features that differentiate between two conditions is a key task in a broad range of scientific domains. In many applications, the features of interest form clusters with similar effects on the data at hand. To recover such clusters we develop DiSC, a data-driven approach for detecting groups of features that differentiate between conditions. For each condition, we construct a graph whose nodes correspond to the features and whose weights are functions of the similarity between them for that condition. We then apply a spectral approach to compute subsets of nodes whose connectivity differs significantly between the condition-specific feature graphs. On the theoretical front, we analyze our approach with a toy example based on the stochastic block model. We evaluate DiSC on a variety of datasets, including MNIST, hyperspectral imaging, simulated scRNA-seq and task fMRI, and demonstrate that DiSC uncovers features that better differentiate between conditions compared to competing methods.
翻译:在一系列广泛的科学领域,选择区分两种条件的特征子集是一项关键任务。在许多应用中,利益特征组成了对手头数据具有类似影响的群集。为了恢复这些群集,我们开发了Disc,这是一种数据驱动的方法,用于检测不同条件的特征组。对于每个条件,我们构建一个图表,其节点与特征相对应,其重量是它们之间相似功能的功能。然后我们应用光谱方法计算结点子集,其连接性在特定条件特征图表之间差别很大。在理论方面,我们用基于随机区块模型的玩具示例分析我们的方法。我们评估了Disc,包括MNIST、超光谱成像、模拟的 scRNA-seq 和任务FMRI 等各种数据集,并证明Disc揭示了与竞争方法相比更能区分条件的特征。