# Generative Adversarial Networks: Correction

Author: Bruno Galerne using various sources (see last cell).


Check that gpu is available:

In [None]:
!nvidia-smi

In [None]:
from __future__ import print_function

import matplotlib.pyplot as plt
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms
import os

from PIL import Image
from IPython.display import display


# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

#GAN MNIST




Load MNIST and define some function to view images

In [None]:
transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.5,), (0.5,))
        ])

datatrain = datasets.MNIST('.', train=True, download=True, transform=transform)
batch_size = 100
trainloader = torch.utils.data.DataLoader(datatrain,
                                          batch_size=batch_size,
                                          shuffle=True)

def imshow(img):
    img = img*0.5 + 0.5     # unnormalize
    pil_img = torchvision.transforms.functional.to_pil_image(img)
    display(pil_img)
    print("Image size (w x h): ",  pil_img.height, "x", pil_img.width)
    return(pil_img)

# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
print('Images:')
imshow(torchvision.utils.make_grid(images, nrow=10))
# print labels
print('Labels:')
for i in range(10):
  print(' '.join('%5s' % str(labels[10*i+j].numpy()) for j in range(10)))


## Generative and discriminative models:

**Exercise**  
Define the Generator and discriminative models following the above instructions:

*Generator*  
Define a class `G_net` with the following architecture:

Generator (input : a random vector in the latent space of dimension $k$)
$k$ is a parameter of the constructor see below.  
The network applies the following list operations: 
1. Fully connected layer, output size 256
2. Leaky ReLU ($\alpha = 0.2$) activation
3. Fully connected layer, output size 512 
4. Leaky ReLU ($\alpha = 0.2$) activation 
5. Fully connected layer, output size 784 
6. Tanh activation
7. Reshape (1x28Ã—28) (use `torch.view()`)

*Discriminator*  
Define a class `D_net` with the following architecture:
1. Flatten images to 1D tensors of size 784 = 28*28.
2. Fully connected layer, output size 512
3. Leaky ReLU ($\alpha = 0.2$) activation
4. Fully connected layer, output size 256
5. Leaky ReLU ($\alpha = 0.2$) activation
6. Fully connected layer, output size 1

Define instances of the networks using $k=32$.

**Remark:** One should add a sigmoid layer to the discriminator to output a probability but this is implicitly done in the criterion (see next exercise).

See https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#define-a-convolutional-neural-network for an example of neural network definition.

In [None]:
# Generator network:
class G_Net(nn.Module):
  def __init__(self, k):
    super(G_Net, self).__init__()
    #TODO
    self.fc1 = nn.Linear(k, 256)
    self.fc2 = nn.Linear(256,512)
    self.fc3 = nn.Linear(512, 784)

  def forward(self,x):  
    #TODO
    x = F.leaky_relu(self.fc1(x),0.2)
    x = F.leaky_relu(self.fc2(x),0.2)
    x = torch.tanh(self.fc3(x))
    x = x.view(-1,1,28,28)
    return(x)


# Discriminator network:
class D_Net(nn.Module):
  def __init__(self):
    super(D_Net, self).__init__()
    self.fc1 = nn.Linear(784, 512)
    self.fc2 = nn.Linear(512, 256)
    self.fc3 = nn.Linear(256, 1)

  def forward(self,x):
    x = x.view(-1,784) # flatten images
    x = self.fc1(x)
    x = F.leaky_relu(x, negative_slope=0.2)
    x = self.fc2(x)
    x = F.leaky_relu(x, negative_slope=0.2)
    x = self.fc3(x)
    return(x)

k = 32
G_net = G_Net(k).to(device)
D_net = D_Net().to(device)




## View some generated images

In [None]:
def show_G_net(z=None):
  # provide random latent code as option to see evolution
  with torch.no_grad():
    if z==None:
      z = torch.randn(100,k).to(device)
    genimages = G_net(z)
    pil_img = imshow(torchvision.utils.make_grid(genimages.to('cpu'),nrow=10))
    return(pil_img)

show_G_net()
zshow = torch.randn(100,k).to(device)

# reproducible:
show_G_net(zshow)

show_G_net(zshow)


## GAN training code

In PyTorch, 
the function `nn.BCEWithLogitsLoss` combines a `Sigmoid` layer and the `BCELoss`, that is,  for $(x,y)\in\mathbb{R}\times \{0,1\}$, 
$$
\ell(x,y) =  -y \cdot \log \sigma(x)
        - (1 - y) \cdot \log (1 - \sigma(x))
$$
where $\sigma: \mathbb{R}\to (0,1)$ is the sigmoid function defined by
$$
\sigma(x) = \frac{e^x}{1+e^{x}} = \frac{1}{1+e^{-x}}.
$$
The sigmoid function plays the role of the `softmax` function for binary classification since it maps $\mathbb{R}\to (0,1)$ to produce the probability of being in the class $y=1$ (and then $1 - \sigma(x)$ is the probability of being in the class $y=0$).

In the course formula of the discriminator loss, 
$$
      \max_{\theta_d}
      \underbrace{
        \sum_{x_{\text{real}} \in \mathcal{T}_{\text{real}}} \log D_{\theta_d}(x_{\text{real}})}_{
        \substack{
          \text{force predicted labels to be 1}\\
          \text{for real images}
        }
      }
      +
      \underbrace{
        \sum_{x_{\text{fake}} \in \mathcal{T}_{\text{fake}}} \log (1 - D_{\theta_d}(x_{\text{fake}}))}_{
        \substack{
          \text{force predicted labels to be 0}\\
          \text{for fake images}
        }}
$$
the sigmoid layer is implicitly included in $D_{\theta_d}$, but this will not be the case in the PyTorch implementation.
In short, 
$$
D_{\theta_d}(x) = \sigma(\mathtt{D\_{net}}(x)).
$$

**Exercise**

Implement the following training algorithm, where $b$ is the batch size (and we take $m=n=b$ in the loss formula):


> For each batch of images $x_{\text{real}}$:
> > **1) Train discriminator:**
> > > Generate $z$ a tensor of size $b\times k$ of idd Gaussian variables  
> > > Generate  $x_{\text{fake}} = \mathtt{G\_{net}}(z)$ a set of $b$ fake images  
> > > Compute the (opposite of the) loss to minimize for the discriminator using `nn.BCEWithLogitsLoss`
> > >  
> > > Compute the gradient and do an optimizer step for the disciminator parameters  

> > **2) Train the generator:**
> > > Generate $z$ a new tensor of size $b\times k$ of idd Gaussian variables  
> > > Compute the loss to minimize
$$
      \underbrace{
        -
        \sum_{z \in \mathcal T_{\text{rand}}} \log D_{\theta_d}(G_{\theta_g}(z)))
      }_{
        \substack{
          \text{force the discriminator to think that}\\
          \text{our generated fake images are real (close to 1)}
        }
      }
$$
using `nn.BCEWithLogitsLoss`  
Compute the gradient and do an optimizer step for the generator parameters

Train the networks for 20 epochs using batch size $b=100$.
Display the current losses and show generated images at the end of each epoch.


In [None]:

D_lr = 0.0002
G_lr = 0.0002
G_optimizer = optim.Adam(G_net.parameters(), lr=D_lr,betas=(0.5, 0.999))
D_optimizer = optim.Adam(D_net.parameters(), lr=G_lr,betas=(0.5, 0.999))

# # mode collapse with SGD:
# G_optimizer = optim.SGD(G_net.parameters(), lr=D_lr, momentum=0.9)
# D_optimizer = optim.SGD(D_net.parameters(), lr=G_lr, momentum=0.9)

criterion = nn.BCEWithLogitsLoss()


In [None]:
num_epochs = 20

# binary class arrays
bsones = torch.ones((batch_size,1), dtype=torch.float).to(device) 
bszeros = torch.zeros((batch_size,1), dtype=torch.float).to(device)

for epoch in range(num_epochs):
  for batch_idx, data in enumerate(trainloader,0):
    # train discriminator:

    x, labels = data
    x = x.to(device)
    D_optimizer.zero_grad()

    z = torch.randn((batch_size, k)).to(device)
    xfake = G_net(z)
    D_loss_real = criterion(D_net(x), bsones)
    D_loss_fake = criterion(D_net(xfake),  bszeros)
    D_loss = D_loss_real + D_loss_fake

    D_loss.backward()
    D_optimizer.step()

    # train generator:

    G_optimizer.zero_grad()
    z = torch.randn((batch_size, k)).to(device)
    xfake = G_net(z)
    G_loss = criterion(D_net(xfake), bsones)

    G_loss.backward()
    G_optimizer.step()

  # end of epoch:
  print('Train Epoch: {} [{}/{} ({:.0f}%)]  DisLoss: {:.6e} GenLoss: {:.6e}'.format(
      epoch+1, (batch_idx+1) * batch_size, len(trainloader.dataset),
      100. * (batch_idx+1) / len(trainloader), D_loss.item(), G_loss.item() ))
  # show 100 generated images with same latent z for each epoch:
  show_G_net(zshow)


##Load checkpoint with 100 epochs

In [None]:
k = 32
G_net = G_Net(k).to(device)
!wget -c 'https://www.idpoisson.fr/galerne/m2_reseaux_neurones/GAN_G_net_ep100.pth'
G_net.load_state_dict(torch.load('GAN_G_net_ep100.pth', map_location=device))


## Interpolation in latent space:

**Exercise:** 

Generate 2 sets of 10 latent variable $z_0$ and $z_1$ and display the generated images by the latent variables:
$$
z_\theta = (1-\theta) z_0 + \theta z_1
$$
for $\theta$ varying between $0$ and $1$ (using the `torch.linspace` function with 20 intermediate values).
Display all the images in a grid of height 10 and width 20 images.


In [None]:
with torch.no_grad():
  nlatent = 10
  ninterp = 20
  z0 = torch.randn(nlatent,k).to(device)
  z1 = torch.randn(nlatent,k).to(device)

  alpha = torch.linspace(0.,1.,ninterp).view(ninterp,1,1).to(device)

  z = alpha*z0 + (1.-alpha)*z1
  z = z.transpose(1,0)
  print(z.size())
  genimages = G_net(z)
  imshow(torchvision.utils.make_grid(genimages.to('cpu'), nrow=ninterp))




## Mode collapse:

**Exercise:** 
Observe than when switching to SGD instead of Adam mode collapse, the generator suffers of mode collapse.

#Sources:
Load MNIST: https://github.com/pytorch/examples/blob/master/mnist/main.py  
GAN architecture: TP GAN by Alasdair Newson: https://sites.google.com/site/alasdairnewson/teaching.  
Another more complex model: https://github.com/RAKIYOU/Training-GAN-on-MNIST