Real-world machine learning systems need to analyze test data that may differ from training data. In K-way classification, this is crisply formulated as open-set recognition, core to which is the ability to discriminate open-set data outside the K closed-set classes. Two conceptually elegant ideas for open-set discrimination are: 1) discriminatively learning an open-vs-closed binary discriminator by exploiting some outlier data as the open-set, and 2) unsupervised learning the closed-set data distribution with a GAN, using its discriminator as the open-set likelihood function. However, the former generalizes poorly to diverse open test data due to overfitting to the training outliers, which are unlikely to exhaustively span the open-world. The latter does not work well, presumably due to the instable training of GANs. Motivated by the above, we propose OpenGAN, which addresses the limitation of each approach by combining them with several technical insights. First, we show that a carefully selected GAN-discriminator on some real outlier data already achieves the state-of-the-art. Second, we augment the available set of real open training examples with adversarially synthesized "fake" data. Third and most importantly, we build the discriminator over the features computed by the closed-world K-way networks. This allows OpenGAN to be implemented via a lightweight discriminator head built on top of an existing K-way network. Extensive experiments show that OpenGAN significantly outperforms prior open-set methods.
翻译:现实世界机器学习系统需要分析可能与培训数据不同的测试数据。 在 Kway 分类中, 它被精确地表述为开放的识别, 核心是能够在 K 封闭型类之外区分开放型数据。 开放型歧视的两个概念优雅的理念是:(1) 以开放型方式利用一些外部数据作为开放型, 从而有区别地学习开放型的二进制歧视。 (2) 不受监督地用一个 GAN 来学习封闭型数据分布, 使用它的导体作为开放型的可能性函数。 但是, 前者由于过度适应培训型外端数据, 因而对不同的开放型测试数据作用很差, 核心是无法在 K 封闭型类歧视型歧视型歧视型歧视型歧视型歧视型歧视型歧视 。 我们提议 OpenGAN, 将每种方法的局限性与若干技术洞察结合起来。 首先, 我们展示了某些真实型外端数据中精心选择的 GAN, 已经实现了开放型的开放型数据 。