It is very common to face classification problems where the number of available labeled samples is small compared to their dimension. These conditions are likely to cause underdetermined settings, with high risk of overfitting. To improve the generalization ability of trained classifiers, common solutions include using priors about the data distribution. Among many options, data structure priors, often represented through graphs, are increasingly popular in the field. In this paper, we introduce a generic model where observed class signals are supposed to be deteriorated with two sources of noise, one independent of the underlying graph structure and isotropic, and the other colored by a known graph operator. Under this model, we derive an optimal methodology to classify such signals. Interestingly, this methodology includes a single parameter, making it particularly suitable for cases where available data is scarce. Using various real datasets, we showcase the ability of the proposed model to be implemented in real world scenarios, resulting in increased generalization accuracy compared to popular alternatives.
翻译:在标签样本数量小于其尺寸的分类问题上,面对分类问题非常常见,因为现有标签样本的数量与其尺寸相比很小。这些条件可能造成不确定的设置,且存在过高的过度填充风险。为了提高受过训练的分类师的普及能力,共同的解决方案包括使用数据分布的前题。在许多选项中,通常以图表表示的数据结构前题在实地越来越受欢迎。在本文中,我们引入了一个通用模型,即观测到的等级信号被认为会因两种噪音来源而变坏,一种来源独立于底图结构,一种是异位图,另一种来源则由已知的图形操作员染色。在这个模型中,我们得出一种最佳的方法来分类这些信号。有趣的是,这种方法包括一个单一参数,特别适合现有数据稀缺的情况。我们使用各种真实数据集展示了拟议模型在现实世界情景中实施的能力,从而与流行的替代方法相比,提高了通用准确性。