We consider a binary supervised learning classification problem where instead of having data in a finite-dimensional Euclidean space, we observe measures on a compact space $\mathcal{X}$. Formally, we observe data $D_N = (\mu_1, Y_1), \ldots, (\mu_N, Y_N)$ where $\mu_i$ is a measure on $\mathcal{X}$ and $Y_i$ is a label in $\{0, 1\}$. Given a set $\mathcal{F}$ of base-classifiers on $\mathcal{X}$, we build corresponding classifiers in the space of measures. We provide upper and lower bounds on the Rademacher complexity of this new class of classifiers that can be expressed simply in terms of corresponding quantities for the class $\mathcal{F}$. If the measures $\mu_i$ are uniform over a finite set, this classification task boils down to a multi-instance learning problem. However, our approach allows more flexibility and diversity in the input data we can deal with. While such a framework has many possible applications, this work strongly emphasizes on classifying data via topological descriptors called persistence diagrams. These objects are discrete measures on $\mathbb{R}^2$, where the coordinates of each point correspond to the range of scales at which a topological feature exists. We will present several classifiers on measures and show how they can heuristically and theoretically enable a good classification performance in various settings in the case of persistence diagrams.
翻译:我们考虑到一个二进制的学习分类问题, 我们不是在有限维度的 Euclidean 空间里有数据, 而是观察在紧凑空间上的测量值 $\ mathcal{X} $。 形式上, 我们观察数据 $D_ N = (\ mu_ 1, Y_ 1,\ ldots, (\ mu_ N, Y_ N), (\ mu_ i) 美元是 $\ mathcal{X} 美元 和 Y_ 美元 美元 的测量值 $% 0, 1 美元 美元 。 我们观察的是, 我们观察了一个基级分类值的设定值 $\ mathcal{F} $ 。 我们观察了相应的分类器在测量空间里 $\ mathalcalcal $ 2 中的相应分类值 。 我们提供这个新的分类器的大小, 可以简单地用 $\math calal ladeal de dal lax a lax ladeal lax a lagial ladeal lagial lax ladeal lax lax lax lax lax lax max max max max max max max max max max max mas max max lax ligal max lax lax lax lax</s>