Machine Learning - The Rantings of a Mad Computer Scientist

AC-GAN, Auxiliary Classifier Generative Adversarial Networks

Posted on 12 April 2017 by Joss Whittle

Tags: Machine Learning, Computer Graphics

In this project I implemented the paper Conditional Image Synthesis With Auxiliary Classifier GANs, Odena et. al. 2016 using the Keras machine learning framework.

Generative Adversarial Networks, Goodfellow et. al. 2014 represents a training regime for teaching neural networks how to synthesize data that could plausibly have come from a distribution of real data - commonly images with a shared theme or aesthetic style such as images of celebrity faces (CelebA), of handwritten digits (MNIST), or of bedrooms (LSUN-Bedroom).

In GANs two models are trained - a generative model that progressively learns to synthesize realistic and plausible images from a random noise input (the latent vector) - and a discriminative model that learns to tell these generated (fake) images from real images sampled from the target dataset. The two models are trained in lock-step such that the generative model learns to fool the discriminator model, and the discriminator adapts to become better at not being fooled by the generator.

This forms a minimax game between the two models which converges to a Nash equilibrium. At this point the generator should be able to consistently produce convincing images that appear to be from the original dataset, but are in-fact parameterized by the latent vector fed to the generative model.

Auxiliary Classifier GANs extend the standard GAN architecture by jointly minimizing the generators ability to fool the discriminative model, with the ability of the discriminator to correctly identify which digit it was shown. This allows the generative model to be parameterized not only by a random latent vector, but also a representative encoding of which digit we would like it to synthesize.

ac-gan

The above image shows the result of my AC-GAN implementation trained on the MNIST dataset. On the left we see real images sampled randomly from MNIST for each of the 10 digit classes, and on the right we see images synthesized by the generative model for each class. The generated images are not sampled completely randomly, in this image I was selecting a random value of the latent vector and sweeping it from a value of 0 to 1. We can see that for each digit class the had the subtle effect of adjusting rotation and "flair" or perhaps "serif-ness", showing that the generative model has mapped the space of possible values that exist in the latent vector to different stylistic traits of the produced digits.

The results of this experiment are satisfying but not great overall. I believe the model suffers from, at least partial, "mode collapse" where the generator learns to produce a subset of possible stylistic variations convincingly and so never attempts to learn how to produce other stylistic variants.

Since the publication of Goodfellow's seminal work on GANs many variations have been proposed that attempt to solve common issues such as mode collapse and training stability.

In the future I plan to revisit this project and implement some of the newer and more advanced methods. While the code for this project is written as a jupyter notebook I do not plan to release the code as it is not very clean or well documented. I will however release well documented code when I revisit this project.

Neural Artistic Style Transfer

Posted on 3 April 2017 by Joss Whittle

Tags: Machine Learning, Computer Graphics

In this project I implemented the paper A Neural Algorithm of Artistic Style, Gatys et. al. 2015 using the Keras machine learning framework.

cat-amuse-combined
Cat photo credit: Claire Whittle

My implementation was loosely based on the fantastic Keras example code by Francois Chollet. In my implementation I modifed the VGG19 architecture using pre-trained weights trained on ImageNet. I replace the maximum pooling layers with average pooling using the same strides and discard the fully connected layers at the end of the network as they are not needed and take up unecessary memory on the GPU.

In Francois' code he makes use of the SciPy L-BFGS optimizer. While this produced nice results in a small number of iterations I found that the high memory requirement of L-BFGS (even though the L stands for Limited-memory) was prohibitive in producing images of a resolution higher than around 400x400. Through experimentation I found that the SciPy Conjugate Gradient optimizer provided good results with greatly reduced memory complexity, allowing me to raise the resolution of produced images to around 720p on a single NVidia 870m GPU.

I plan to revisit this project in the future implementing it entirely in Tensorflow. I may also investigate newer and more advanced methods that have been proposed since the publication of Gatys' seminal paper in this area.

Full code for this project is available here as a Gist.

In the remainder of this post I will show some of the images that I produced with the linked code.

boat1-starrynight-combined-1
Boat photo credit: John Whittle