"how numbers are stored and used in computers"

MNIST with Keras

This Python script trains a Convolutional Neural Network (CNN) using Keras to recognize handwritten digits from the MNIST dataset. The MNIST dataset contains 60,000 training images and 10,000 testing images of handwritten digits 0 through 9, each stored in grayscale pixels (i.e. ).

This is an illustrative example of loading and preprocessing data, normalizing pixel values to the interval , and reshaping image data to include a channel dimension for compatibility with convolutional layers. You will commonly find yourself using similar techniques as the ones here, such as converting labels to one-hot encoded vectors for classification, and constructing a sequential CNN model with layers designed to extract features from the images while reducing dimensionality.

Data preparation

You will need to install Keras to follow along with this tutorial.

code.py
1# mnist_convnet.py
2import numpy as np
3import keras
4from keras import layers
5from keras.utils import to_categorical

There are ten possible digits that can be classified, and each digit is represented by a pixel grayscale image.

code.py
1# mnist_convnet.py
2num_classes = 10
3input_shape = (28, 28, 1)

The mnist.load_data() function returns tuples of training and test data.

code.py
1# Load the data and split it between train and test sets
2(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

First we normalize the image data to the range for better training performance.

code.py
1x_train = x_train.astype("float32") / 255
2x_test = x_test.astype("float32") / 255

This next step is required for Conv2D layers, where we use np.expand_dims to reshape the images so there is a channel dimension, by adding a new dimension at the end (axis -1) of the x_train array. The original shape of x_train is (sixty thousand images that are each pixels), but Keras expects each image to have a shape like . Since these are grayscale images, they only have one channel, unlike color images which have three (red, green, and blue). By adding an extra dimension, we reshape the data to , which makes it compatible with the convolution layers which expect a "3D" image specification.

code.py
1x_train = np.expand_dims(x_train, -1)
2x_test = np.expand_dims(x_test, -1)

This next step converts class labels to one-hot encoded vectors, so would become and would become . A well-trained model should be able to input an arbitrary image and generate an output vector which is closest in distance to the one-hot vector of the correct classification.

code.py
1y_train = to_categorical(y_train, num_classes)
2y_test = to_categorical(y_test, num_classes)

Training

Now that the data has been prepared, it's finally time to start training! We will start by defining a batch_size, which determines the number of training examples the model sees in each training batch. We will also define epochs, which determines the number of times the model will see the entire dataset.

code.py
1batch_size = 128
2epochs = 3

Now we can define our model with keras.Sequential, which constructs a CNN model with the given ordering of layers.

code.py
1model = keras.Sequential(
2    [
3        layers.Input(shape=input_shape),                            # Input layer matching image shape
4        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),   # First convolutional layer (32 filters, 3x3 kernel)
5        layers.MaxPooling2D(pool_size=(2, 2)),                      # Downsamples the feature map
6        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),   # Second convolutional layer (64 filters)
7        layers.MaxPooling2D(pool_size=(2, 2)),                      # Second pooling layer
8        layers.Flatten(),                                           # Flattens 2D feature maps into 1D vector
9        layers.Dropout(0.5),                                        # Randomly drop 50% of activations to prevent overfitting
10        layers.Dense(num_classes, activation="softmax"),            # Output layer: 10 units, one for each class
11    ]
12)

Let's display the model architecture and compile it with the specified loss function, optimizer, and evaluation metric.

TODO: Adam optimizer (adaptive learning rate)

code.py
1model.summary()
2
3model.compile(
4    loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
5)

Finally, let's evaluate the model on the test data.

code.py
1model.fit(
2    x_train, y_train, 
3    batch_size=batch_size, 
4    epochs=epochs, 
5    validation_split=0.1
6)
7
8score = model.evaluate(x_test, y_test, verbose=0)
9print("Test loss:", score[0])
10print("Test accuracy:", score[1])

TODO: component to show terminal output