条件GAN重大改进!cGANs with Projection Discriminator

2018 年 2 月 7 日 CreateAMind
条件GAN重大改进!cGANs with Projection Discriminator

https://github.com/pfnet-research/sngan_projection




cGANs with Projection Discriminator 

Takeru Miyato, Masanori Koyama

03 Feb 2018ICLR 2018 Conference Blind Submissionreaders: everyoneShow BibtexRevisions




Abstract: We propose a novel, projection based way to incorporate the conditional information into the discriminator of GANs that respects the role of the conditional information in the underlining probabilistic model. This approach is in contrast with most frameworks of conditional GANs used in application today, which use the conditional information by concatenating the (embedded) conditional vector to the feature vectors. With this modification, we were able to significantly improve the quality of the class conditional image generation on ILSVRC2012 (ImageNet) dataset from the current state-of-the-art result, and we achieved this with a single pair of a discriminator and a generator. We were also able to extend the application to super-resolution and succeeded in producing highly discriminative super-resolution images. This new structure also enabled high quality category transformation based on parametric functional transformation of conditional batch normalization layers in the generator.




github README:


NOTE: The setup and example code in this README are for training GANs on single GPU. The models are smaller than the ones used in the papers. Please go to link if you are looking for how to reproduce the results in the papers.

Official Chainer implementation for conditional image generation on ILSVRC2012 dataset (ImageNet) with spectral normalization and projection discrimiantor.

Demo movies

Consecutive category morphing movies:

  • (5x5 panels 128px images) https://www.youtube.com/watch?v=q3yy5Fxs7Lc

  • (10x10 panels 128px images) https://www.youtube.com/watch?v=83D_3WXpPjQ

Other materials

  • Generated images

    • from the model trained on all ImageNet images (1K categories), 128px

    • from the model trained on dog and cat images (143 categories), 128px

  • Pretrained models

  • Movies

  • 4 corners category morph.


References

  • Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida. Spectral Normalization for Generative Adversarial Networks. ICLR2018. OpenReview

  • Takeru Miyato, Masanori Koyama. cGANs with Projection Discriminator. ICLR2018. OpenReview

Setup

Install required python libraries:

pip install -r requirements.txt

Download ImageNet dataset:

Please download ILSVRC2012 dataset from http://image-net.org/download-images

Preprocess dataset:

cd datasets
IMAGENET_TRAIN_DIR=/path/to/imagenet/train/
PREPROCESSED_DATA_DIR=/path/to/save_dir/
bash preprocess.sh $IMAGENET_TRAIN_DIR $PREPROCESSED_DATA_DIR
# Make the list of image-label pairs for all images (1000 categories, 1281167 images).
python imagenet.py $PREPROCESSED_DATA_DIR
# Make the list of image-label pairs for dog and cat images (143 categories, 180373 images). 
puthon imagenet_dog_and_cat.py $PREPROCESSED_DATA_DIR

Download inception model:

python source/inception/download.py --outfile=datasets/inception_model

Training examples

Spectral normalization + projection discriminator for 64x64 dog and cat images:

LOGDIR = /path/to/logdir
CONFIG = configs/sn_projection_dog_and_cat_64.yml
python train.py --config=$CONFIG --results_dir=$LOGDIR --data_dir=$PREPROCESSED_DATA_DIR

Spectral normalization + projection discriminator for 64x64 all ImageNet images:

LOGDIR = /path/to/logdir
CONFIG = configs/sn_projection_64.yml
python train.py --config=$CONFIG --results_dir=$LOGDIR --data_dir=$PREPROCESSED_DATA_DIR

Evaluation

Calculate inception score (with the original OpenAI implementation)

python evaluations/calc_inception_score.py --config=$CONFIG --snapshot=${LOGDIR}/ResNetGenerator_<iterations>.npz --results_dir=${LOGDIR}/inception_score --splits=10 --tf

Generate images and save them in ${LOGDIR}/gen_images

python evaluations/gen_images.py --config=$CONFIG --snapshot=${LOGDIR}/ResNetGenerator_<iterations>.npz --results_dir=${LOGDIR}/gen_images




    登录查看更多
    6

    相关内容

    In this paper, we address the hyperspectral image (HSI) classification task with a generative adversarial network and conditional random field (GAN-CRF) -based framework, which integrates a semi-supervised deep learning and a probabilistic graphical model, and make three contributions. First, we design four types of convolutional and transposed convolutional layers that consider the characteristics of HSIs to help with extracting discriminative features from limited numbers of labeled HSI samples. Second, we construct semi-supervised GANs to alleviate the shortage of training samples by adding labels to them and implicitly reconstructing real HSI data distribution through adversarial training. Third, we build dense conditional random fields (CRFs) on top of the random variables that are initialized to the softmax predictions of the trained GANs and are conditioned on HSIs to refine classification maps. This semi-supervised framework leverages the merits of discriminative and generative models through a game-theoretical approach. Moreover, even though we used very small numbers of labeled training HSI samples from the two most challenging and extensively studied datasets, the experimental results demonstrated that spectral-spatial GAN-CRF (SS-GAN-CRF) models achieved top-ranking accuracy for semi-supervised HSI classification.

    0
    3
    下载
    预览

    Generative Adversarial networks (GANs) have obtained remarkable success in many unsupervised learning tasks and unarguably, clustering is an important unsupervised learning problem. While one can potentially exploit the latent-space back-projection in GANs to cluster, we demonstrate that the cluster structure is not retained in the GAN latent space. In this paper, we propose ClusterGAN as a new mechanism for clustering using GANs. By sampling latent variables from a mixture of one-hot encoded variables and continuous latent variables, coupled with an inverse network (which projects the data to the latent space) trained jointly with a clustering specific loss, we are able to achieve clustering in the latent space. Our results show a remarkable phenomenon that GANs can preserve latent space interpolation across categories, even though the discriminator is never exposed to such vectors. We compare our results with various clustering baselines and demonstrate superior performance on both synthetic and real datasets.

    0
    6
    下载
    预览

    We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). Conditional GANs have enabled a variety of applications, but the results are often limited to low-resolution and still far from realistic. In this work, we generate 2048x1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures. Furthermore, we extend our framework to interactive visual manipulation with two additional features. First, we incorporate object instance segmentation information, which enables object manipulations such as removing/adding objects and changing the object category. Second, we propose a method to generate diverse results given the same input, allowing users to edit the object appearance interactively. Human opinion studies demonstrate that our method significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.

    0
    3
    下载
    预览

    Distant supervision can effectively label data for relation extraction, but suffers from the noise labeling problem. Recent works mainly perform soft bag-level noise reduction strategies to find the relatively better samples in a sentence bag, which is suboptimal compared with making a hard decision of false positive samples in sentence level. In this paper, we introduce an adversarial learning framework, which we named DSGAN, to learn a sentence-level true-positive generator. Inspired by Generative Adversarial Networks, we regard the positive samples generated by the generator as the negative samples to train the discriminator. The optimal generator is obtained until the discrimination ability of the discriminator has the greatest decline. We adopt the generator to filter distant supervision training dataset and redistribute the false positive instances into the negative set, in which way to provide a cleaned dataset for relation classification. The experimental results show that the proposed strategy significantly improves the performance of distant supervision relation extraction comparing to state-of-the-art systems.

    0
    14
    下载
    预览

    In this paper, we propose a novel conditional generative adversarial nets based image captioning framework as an extension of traditional reinforcement learning (RL) based encoder-decoder architecture. To deal with the inconsistent evaluation problem between objective language metrics and subjective human judgements, we are inspired to design some "discriminator" networks to automatically and progressively determine whether generated caption is human described or machine generated. Two kinds of discriminator architecture (CNN and RNN based structures) are introduced since each has its own advantages. The proposed algorithm is generic so that it can enhance any existing encoder-decoder based image captioning model and we show that conventional RL training method is just a special case of our framework. Empirically, we show consistent improvements over all language evaluation metrics for different stage-of-the-art image captioning models.

    1
    7
    下载
    预览

    Image-to-image translation tasks have been widely investigated with Generative Adversarial Networks (GANs) and dual learning. However, existing models lack the ability to control the translated results in the target domain and their results usually lack of diversity in the sense that a fixed image usually leads to (almost) deterministic translation result. In this paper, we study a new problem, conditional image-to-image translation, which is to translate an image from the source domain to the target domain conditioned on a given image in the target domain. It requires that the generated image should inherit some domain-specific features of the conditional image from the target domain. Therefore, changing the conditional image in the target domain will lead to diverse translation results for a fixed input image from the source domain, and therefore the conditional input image helps to control the translation results. We tackle this problem with unpaired data based on GANs and dual learning. We twist two conditional translation models (one translation from A domain to B domain, and the other one from B domain to A domain) together for inputs combination and reconstruction while preserving domain independent features. We carry out experiments on men's faces from-to women's faces translation and edges to shoes&bags translations. The results demonstrate the effectiveness of our proposed method.

    0
    7
    下载
    预览

    With the development of deep learning, Deep Metric Learning (DML) has achieved great improvements in face recognition. Specifically, the widely used softmax loss in the training process often bring large intra-class variations, and feature normalization is only exploited in the testing process to compute the pair similarities. To bridge the gap, we impose the intra-class cosine similarity between the features and weight vectors in softmax loss larger than a margin in the training step, and extend it from four aspects. First, we explore the effect of a hard sample mining strategy. To alleviate the human labor of adjusting the margin hyper-parameter, a self-adaptive margin updating strategy is proposed. Then, a normalized version is given to take full advantage of the cosine similarity constraint. Furthermore, we enhance the former constraint to force the intra-class cosine similarity larger than the mean inter-class cosine similarity with a margin in the exponential feature projection space. Extensive experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and IARPA Janus Benchmark A (IJB-A) datasets demonstrate that the proposed methods outperform the mainstream DML methods and approach the state-of-the-art performance.

    0
    4
    下载
    预览

    In this paper we propose a new conditional GAN for image captioning that enforces semantic alignment between images and captions through a co-attentive discriminator and a context-aware LSTM sequence generator. In order to train these sequence GANs, we empirically study two algorithms: Self-critical Sequence Training (SCST) and Gumbel Straight-Through. Both techniques are confirmed to be viable for training sequence GANs. However, SCST displays better gradient behavior despite not directly leveraging gradients from the discriminator. This ensures a stronger stability of sequence GANs training and ultimately produces models with improved results under human evaluation. Automatic evaluation of GAN trained captioning models is an open question. To remedy this, we introduce a new semantic score with strong correlation to human judgement. As a paradigm for evaluation, we suggest that the generalization ability of the captioner to Out of Context (OOC) scenes is an important criterion to assess generalization and composition. To this end, we propose an OOC dataset which, combined with our automatic metric of semantic score, is a new benchmark for the captioning community to measure the generalization ability of automatic image captioning. Under this new OOC benchmark, and on the traditional MSCOCO dataset, our models trained with SCST have strong performance in both semantic score and human evaluation.

    0
    6
    下载
    预览

    Class labels have been empirically shown useful in improving the sample quality of generative adversarial nets (GANs). In this paper, we mathematically study the properties of the current variants of GANs that make use of class label information. With class aware gradient and cross-entropy decomposition, we reveal how class labels and associated losses influence GAN's training. Based on that, we propose Activation Maximization Generative Adversarial Networks (AM-GAN) as an advanced solution. Comprehensive experiments have been conducted to validate our analysis and evaluate the effectiveness of our solution, where AM-GAN outperforms other strong baselines and achieves state-of-the-art Inception Score (8.91) on CIFAR-10. In addition, we demonstrate that, with the Inception ImageNet classifier, Inception Score mainly tracks the diversity of the generator, and there is, however, no reliable evidence that it can reflect the true sample quality. We thus propose a new metric, called AM Score, to provide more accurate estimation on the sample quality. Our proposed model also outperforms the baseline methods in the new metric.

    0
    4
    下载
    预览

    We present FusedGAN, a deep network for conditional image synthesis with controllable sampling of diverse images. Fidelity, diversity and controllable sampling are the main quality measures of a good image generation model. Most existing models are insufficient in all three aspects. The FusedGAN can perform controllable sampling of diverse images with very high fidelity. We argue that controllability can be achieved by disentangling the generation process into various stages. In contrast to stacked GANs, where multiple stages of GANs are trained separately with full supervision of labeled intermediate images, the FusedGAN has a single stage pipeline with a built-in stacking of GANs. Unlike existing methods, which requires full supervision with paired conditions and images, the FusedGAN can effectively leverage more abundant images without corresponding conditions in training, to produce more diverse samples with high fidelity. We achieve this by fusing two generators: one for unconditional image generation, and the other for conditional image generation, where the two partly share a common latent space thereby disentangling the generation. We demonstrate the efficacy of the FusedGAN in fine grained image generation tasks such as text-to-image, and attribute-to-face generation.

    0
    8
    下载
    预览
    小贴士
    相关资讯
    DeepMind开源最牛无监督学习BigBiGAN预训练模型
    新智元
    9+阅读 · 2019年10月10日
    2018 年最棒的三篇 GAN 论文
    AI科技评论
    4+阅读 · 2019年1月14日
    计算机视觉近一年进展综述
    机器学习研究会
    6+阅读 · 2017年11月25日
    gan生成图像at 1024² 的 代码 论文
    CreateAMind
    4+阅读 · 2017年10月31日
    Adversarial Variational Bayes: Unifying VAE and GAN 代码
    CreateAMind
    7+阅读 · 2017年10月4日
    Auto-Encoding GAN
    CreateAMind
    5+阅读 · 2017年8月4日
    GAN猫的脸
    机械鸡
    9+阅读 · 2017年7月8日
    相关VIP内容
    专知会员服务
    35+阅读 · 2020年7月4日
    专知会员服务
    13+阅读 · 2020年4月28日
    生成式对抗网络GAN异常检测
    专知会员服务
    78+阅读 · 2019年10月13日
    相关论文
    Zilong Zhong,Jonathan Li,David A. Clausi,Alexander Wong
    3+阅读 · 2019年5月12日
    ClusterGAN : Latent Space Clustering in Generative Adversarial Networks
    Sudipto Mukherjee,Himanshu Asnani,Eugene Lin,Sreeram Kannan
    6+阅读 · 2018年9月10日
    High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
    Ting-Chun Wang,Ming-Yu Liu,Jun-Yan Zhu,Andrew Tao,Jan Kautz,Bryan Catanzaro
    3+阅读 · 2018年8月20日
    Pengda Qin,Weiran Xu,William Yang Wang
    14+阅读 · 2018年5月24日
    Chen Chen,Shuai Mu,Wanpeng Xiao,Zexiong Ye,Liesi Wu,Fuming Ma,Qi Ju
    7+阅读 · 2018年5月18日
    Jianxin Lin,Yingce Xia,Tao Qin,Zhibo Chen,Tie-Yan Liu
    7+阅读 · 2018年5月1日
    Bowen Wu,Huaming Wu,Monica M. Y. Zhang
    4+阅读 · 2018年5月1日
    Igor Melnyk,Tom Sercu,Pierre L. Dognin,Jarret Ross,Youssef Mroueh
    6+阅读 · 2018年4月30日
    Zhiming Zhou,Han Cai,Shu Rong,Yuxuan Song,Kan Ren,Weinan Zhang,Yong Yu,Jun Wang
    4+阅读 · 2018年1月30日
    Navaneeth Bodla,Gang Hua,Rama Chellappa
    8+阅读 · 2018年1月17日
    Top