Generative Deep Learning Updated EditionChapter 84

6.4 Evaluating the Model

Section 4 of 5-~ 12 min read-Synced from Cuantum content

Evaluating the performance of a Variational Autoencoder (VAE) is crucial to ensure that it has learned meaningful latent representations and can generate high-quality images. In this section, we will discuss various methods to evaluate our VAE, including quantitative metrics and qualitative assessments. We will also provide example codes to demonstrate these evaluation techniques.

6.4.1 Quantitative Evaluation Metrics

Quantitative metrics provide objective measures of the model's performance. For VAEs, some common metrics include Reconstruction Loss, KL Divergence, Inception Score (IS), and Fréchet Inception Distance (FID).

Reconstruction Loss

Reconstruction loss measures how well the decoder can reconstruct the input images from the latent variables. A lower reconstruction loss indicates that the model is able to generate images that closely resemble the original input.

Example: Calculating Reconstruction Loss

import numpy as npfrom tensorflow.keras.losses import binary_crossentropy # Calculate reconstruction loss for the test setreconstructed_images = vae.predict(x_test)reconstruction_loss = np.mean(binary_crossentropy(x_test, reconstructed_images)) print(f"Reconstruction Loss: {reconstruction_loss}")

The script first loads the necessary libraries. Then, it uses the trained VAE model to create reconstructed images from the test dataset. The reconstruction loss, which measures the difference between the original and reconstructed images, is then calculated using the binary cross-entropy loss function. Finally, the reconstruction loss is printed out.

KL Divergence

KL Divergence measures the difference between the learned latent distribution and the prior distribution (usually a standard normal distribution). A lower KL divergence indicates that the latent distribution is closer to the desired prior distribution.

Example: Calculating KL Divergence

# Calculate KL Divergence for the test setdef calculate_kl_divergence(encoder, x_test):    z_mean, z_log_var, _ = encoder.predict(x_test)    kl_divergence = 1 + z_log_var - np.square(z_mean) - np.exp(z_log_var)    kl_divergence = np.sum(kl_divergence, axis=-1)    kl_divergence *= -0.5    return np.mean(kl_divergence) kl_divergence = calculate_kl_divergence(encoder, x_test)print(f"KL Divergence: {kl_divergence}")

The function calculatekldivergence takes an encoder and xtest as inputs. The encoder predicts the mean and log variance (zmean and zlogvar) and these are used to calculate the KL divergence. The KL divergence is calculated for each data point in the test set, and then the mean KL divergence across the entire set is returned.

Finally, the KL divergence is calculated using this function and printed to the console.

Inception Score (IS)

Inception Score evaluates the quality and diversity of generated images. It uses a pre-trained Inception network to classify the generated images and calculates the KL divergence between the conditional label distribution and the marginal label distribution.

Example: Calculating Inception Score

from tensorflow.keras.applications.inception_v3 import InceptionV3, preprocess_inputfrom scipy.stats import entropy # Function to calculate Inception Scoredef calculate_inception_score(images, n_split=10, eps=1E-16):    model = InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3))    images_resized = tf.image.resize(images, (299, 299))    images_preprocessed = preprocess_input(images_resized)    preds = model.predict(images_preprocessed)     split_scores = []    for i in range(n_split):        part = preds[i * preds.shape[0] // n_split: (i + 1) * preds.shape[0] // n_split]        py = np.mean(part, axis=0)        scores = []        for p in part:            scores.append(entropy(p, py))        split_scores.append(np.exp(np.mean(scores)))    return np.mean(split_scores), np.std(split_scores) # Generate images for evaluationn_samples = 1000random_latent_vectors = np.random.normal(size=(n_samples, latent_dim))generated_images = decoder.predict(random_latent_vectors)generated_images = generated_images.reshape((n_samples, 28, 28, 1)) # Calculate Inception Scoreis_mean, is_std = calculate_inception_score(generated_images)print(f"Inception Score: {is_mean} ± {is_std}")

The function calculateinceptionscore takes a set of images as input, resizes the images to the appropriate size for the InceptionV3 model, and preprocesses the images. Then, it uses the InceptionV3 model to make predictions on the preprocessed images.

The function calculates the Inception Score by splitting the predictions into parts, calculating the entropy between each part and the mean of all parts, and then averaging the exponentiated entropy scores across all parts.

Finally, it generates a set of images from random latent vectors using a decoder (presumably from a GAN), reshapes the images, and calculates the Inception Score for the generated images. The mean and standard deviation of the Inception Score are then printed.

Fréchet Inception Distance (FID)

FID measures the distance between the distributions of real and generated images. Lower FID scores indicate that the generated images are more similar to the real images.

Example: Calculating FID

from numpy import cov, trace, iscomplexobjfrom scipy.linalg import sqrtm # Function to calculate FIDdef calculate_fid(real_images, generated_images):    model = InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3))    real_images_resized = tf.image.resize(real_images, (299, 299))    generated_images_resized = tf.image.resize(generated_images, (299, 299))    real_images_preprocessed = preprocess_input(real_images_resized)    generated_images_preprocessed = preprocess_input(generated_images_resized)    act1 = model.predict(real_images_preprocessed)    act2 = model.predict(generated_images_preprocessed)     mu1, sigma1 = act1.mean(axis=0), cov(act1, rowvar=False)    mu2, sigma2 = act2.mean(axis=0), cov(act2, rowvar=False)    ssdiff = np.sum((mu1 - mu2) ** 2.0)    covmean = sqrtm(sigma1.dot(sigma2))    if iscomplexobj(covmean):        covmean = covmean.real    fid = ssdiff + trace(sigma1 + sigma2 - 2.0 * covmean)    return fid # Sample real imagesreal_images = x_test[:n_samples].reshape((n_samples, 28, 28, 1)) # Calculate FIDfid_score = calculate_fid(real_images, generated_images)print(f"FID Score: {fid_score}")

The function calculatefid takes two parameters: realimages and generated_images. It first resizes the images to the dimensions expected by the InceptionV3 model (299x299 pixels). The images are then preprocessed and fed into the model to obtain their activations.

The mean and covariance of the activations are computed which are then used to calculate the FID score. The FID score is a measure of similarity between the two sets of images; lower scores indicate more similar or better-quality generated images.

Finally, the FID score between a sample of real images and the generated images is calculated and printed.

6.4.2 Qualitative Evaluation

Qualitative evaluation involves visually inspecting the generated images to assess their quality and diversity. This method is subjective but provides valuable insights into the model's performance.

Visual Inspection

Visual inspection involves generating a set of images and examining them for realism and diversity. This helps identify any obvious issues such as blurriness, artifacts, or mode collapse.

Example: Visualizing Generated Images

# Function to visualize generated imagesdef visualize_generated_images(decoder, latent_dim, n_samples=10):    random_latent_vectors = np.random.normal(size=(n_samples, latent_dim))    generated_images = decoder.predict(random_latent_vectors)    generated_images = generated_images.reshape((n_samples, 28, 28))     plt.figure(figsize=(10, 2))    for i in range(n_samples):        plt.subplot(1, n_samples, i + 1)        plt.imshow(generated_images[i], cmap='gray')        plt.axis('off')    plt.show() # Visualize generated imagesvisualize_generated_images(decoder, latent_dim)

This example defines a function, visualizegeneratedimages, and uses it. This function generates images from random latent vectors (a type of data representation) using a given decoder model. It then reshapes the generated images and visualizes them in a 1 by nsamples subplot grid. After defining the function, the script calls it to visualize some images generated by the 'decoder' model with a specified 'latentdim' (latent dimension).

Latent Space Traversal

Latent space traversal involves interpolating between points in the latent space and generating images at each step. This technique helps visualize how smoothly the VAE transitions between different data points and can reveal the structure of the latent space.

Example: Latent Space Traversal

# Function to perform latent space traversaldef latent_space_traversal(decoder, latent_dim, n_steps=10):    start_point = np.random.normal(size=(1, latent_dim))    end_point = np.random.normal(size=(1, latent_dim))    interpolation = np.linspace(start_point, end_point, n_steps)     generated_images = decoder.predict(interpolation)    generated_images = generated_images.reshape((n_steps, 28, 28))     plt.figure(figsize=(15, 2))    for i in range(n_steps):        plt.subplot(1, n_steps, i + 1)        plt.imshow(generated_images[i], cmap='gray')        plt.axis('off')    plt.show() # Perform latent space traversallatent_space_traversal(decoder, latent_dim)

This is a function named 'latentspacetraversal'. It takes in three parameters: a decoder, a latent dimension, and an optional number of steps which is set to 10 by default. The function generates two random points in the latent space, which are the start and end points. Then, it creates a linear interpolation between these two points.

The generated points in the latent space are then passed through the decoder to generate images. These images are then reshaped into a 28x28 pixel format (common for MNIST dataset images) and displayed in a plot. The final line of code calls and executes the function.

6.4.3 Evaluating Specific Features

By exploring different regions of the latent space, we can generate digits with specific features and evaluate how well the VAE has learned to represent these features.

Example: Exploring Specific Latent Features

# Function to explore specific latent featuresdef explore_latent_features(decoder, latent_dim, feature_vector, variation_range=(-3, 3), n_variations=10):    feature_variations = np.linspace(variation_range[0], variation_range[1], n_variations)    latent_vectors = np.zeros((n_variations, latent_dim))    for i, variation in enumerate(feature_variations):        latent_vectors[i] = feature_vector        latent_vectors[i, 0] = variation  # Vary the first feature for demonstration     generated_images = decoder.predict(latent_vectors)    generated_images = generated_images.reshape((n_variations, 28, 28))     plt.figure(figsize=(15, 2))    for i in range(n_variations):        plt.subplot(1, n_variations, i + 1)        plt.imshow(generated_images[i], cmap='gray')        plt.axis('off')    plt.show() # Example feature vectorexample_feature_vector = np.random.normal(size=(latent_dim,)) # Explore specific latent featuresexplore_latent_features(decoder, latent_dim, example_feature_vector)

This example code is for a function named explorelatentfeatures. This function is used to explore and visualize the effects of varying specific latent features in a generative model, such as a Variational Autoencoder (VAE). The function takes a decoder model, the dimensionality of the latent space (latentdim), a feature vector (featurevector), and parameters for the range and number of variations to apply to the feature vector.

The function first generates a set of new latent vectors by applying a range of variations to the input feature vector. It then uses the decoder model to generate images from these latent vectors and reshapes the images for visualization.

Next, it plots the generated images in a row, showing the effects of varying the specific latent feature on the generated images. It uses an example feature vector randomly generated from a normal distribution for demonstration.

In the example, the first feature is varied for demonstration. However, you can modify the index to explore other latent features.