Most classifiers rely on discriminative boundaries that separate instances of each class from everything else. We argue that discriminative boundaries are counter-intuitive as they define semantics by what-they-are-not; and should be replaced by generative classifiers which define semantics by what-they-are. Unfortunately, generative classifiers are significantly less accurate. This may be caused by the tendency of generative models to focus on easy to model semantic generative factors and ignore non-semantic factors that are important but difficult to model. We propose a new generative model in which semantic factors are accommodated by shell theory's hierarchical generative process and non-semantic factors by an instance specific noise term. We use the model to develop a classification scheme which suppresses the impact of noise while preserving semantic cues. The result is a surprisingly accurate generative classifier, that takes the form of a modified nearest-neighbor algorithm; we term it distance classification. Unlike discriminative classifiers, a distance classifier: defines semantics by what-they-are; is amenable to incremental updates; and scales well with the number of classes.
翻译:多数分类者都依赖将每一类的情况与其它所有情况区分开来的歧视界限。 我们争辩说, 歧视性界限是反直觉的,因为它们用什么来定义语义; 并且应该被用什么来定义语义的基因分类者所取代。 不幸的是, 基因分类者远不那么精确。 这可能是由于基因化模型倾向于以易于建模的语义基因变异因素和忽视重要但难以建模的非语义因素造成的。 我们提出了一个新的基因化模型, 由空壳理论的等级基因变异过程和非语义因素以实例特定的噪音术语来容纳语义因素。 我们使用该模型来制定一种分类计划,既抑制噪音的影响,又保留语义的提示。 其结果是一个令人惊讶的精确的基因变异种, 其形式是经过修改的近邻算法; 我们称之为距离分类。 与歧视性分类者不同, 远程分类者: 用什么来定义语义; 容易进行递增更新; 比例与阶级数目相当。