Getting started with machine learning

Note: this post was originally posted on Systek's website (where I work) in Norwegian. This post is slightly more detailed but essentially the same.

For many people machine learning might seem like something magical used by wizards in their ivory tower. Perhaps it is, but getting started with ML is actually not that difficult and hopefully this post will teach your first cantrips and pave the path for you to become an archmage of machine learning in the future. Translation: We'll set up a "hello world"-example of ML.

So the goal of this post is to create a simple example of computer-vision where we train a model that's able to predict handwritten digits.

Setting up the environment

To achieve this goal we will use Tensorflow, which is an open source library to train ML models, and Keras, which is an API that simplifies this process. To use Tensorflow we will use Python. I will assume you already have this installed, if not, there are plenty tutorials out there that can help you with that.

I recommend using Anaconda (and Conda) to manage your virtual environment and dependencies, but you could use other alternatives if you want to. If so, skip this part.

I installed Anaconda Individual Edition which you can find here. If you follow the installation instructions you will get anaconda and conda.

To create an environment you can use the following code:

conda create --name <your-env-name> <...packages>

For future reference, if you want to remove an environment simply write:

conda env remove --name <your-env-name>

To be more explicit for this tutorial you can type:

conda create --name computer-vision tensorflow keras matplotlib

This will download and install the necessary packages for Tensorflow, Keras and matplotlib (we'll need it later). You will be prompted to proceed with the download and install so simply type y to continue.

In our computer-vision environment we've installed tensorflow and keras, but we need to activate that environment and you can do so like this:

conda activate computer-vision

Create a basic python file called index.py and open up your favorite editor or IDE. Personally I use Pycharm (anaconda) for python and ML. Add the following lines:

import tensorflow as tf

print(tf.__version__)

Execute the program and, depending on when you're reading this, hopefully you should get 2.2.0 or something similar. If so, we've setup our environment correctly. Huzzah!

Using MNIST

The MNIST dataset is a collection of handwritten digits from 0 to 9. It contains 60 000 training images and 10 000 testing images. It's commonly used for training and testing in ML.

Luckily, it is quite easy to get started with since Keras provides a method for us to load the dataset. Let us modify our index.py to the following:

import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_dataset()

# Verify that we have 60 000 trainining images and 10 000 testing images
print(len(x_train)) # should be 60 000
print(len(x_test)) # should be 10 000

Before we continue, remove our print statements as we do not need them any longer. A common practice is to normalize the values so the values are between 0 and 1. From what I understand is that the activation functions in our neural network works better with such ranges. Add this one line so we can normalize our values.

x_train, x_test = x_train / 255.0, x_test / 255.0

Next we'll want to create our model. Our model will be using one input layer, one hidden layer and one output layer. The images we get from the MNIST dataset are 28x28 pixels. The images are grayscale meaning we only have a single channel. So we can tell our input layer that we expect the input to be 28 by 28 by using input_shape=(28, 28, 1)). The input is flattened into a 1D vector. This is necessary for our output layer to do the classification. However, we lose spatial information (e.g. which pixels are next to each other). The hidden layer consists of 128 neurons and uses the rectified linear activation function (ReLU/relu). It's easier to think of the neurons as parameters to a function. The objective of our neural network is to find a rule which finds the parameters necessary to convert the 784 values we get from the input layer into one of our digits. The output layer contains as many neurons as we have classes (one for each digit).

model = tf.keras.Sequential([
    // input layer
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
    // hidden layer
    tf.keras.layers.Dense(128, activation='relu'),
    // output layer
    tf.keras.layers.Dense(10, activation='softmax')
])
# To examine our model:
model.summary()

The result of the model.summary() will show us the output shape of the input layer is 784 values (from 28*28 pixels).

Layer (type)                 Output Shape              Param #   
=================================================================
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               100480    
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1290      
=================================================================

Next we'll want to to compile our model. We need to specify an optimizer function, a loss function and metrics. I'll briefly explain what these functions are, but I will not to any deep dive into a specific optimizer or loss function as that will just increase the length of this blog post by too much and because I have limited knowledge in the topic.

The neural network does not know the relationship between the image and the categories (digits). When the NN makes a guess on the function that describes the relationship, the loss function measures how good the guess was. Then the optimizer figures out what the next guess should be based on the data from the loss function.

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])

After we've compiled our model we can start training it. This is done by using model.fit. We will use the training data (images and categories) that we got from the MNIST dataset and we'll train it for five epochs. How long you should train your model depends on the training set and how much you increase your accuracy for every iteration.

model.fit(x_train, y_train, epochs=5)

I got an accuracy of 98.61%. Your results may vary slightly.

Now we need to evaluate how the model performs on data it has not seen before. The result I got on the evaluation 97.86% accuracy and loss of 7.11%. Not too bad. Could we do better with e.g. more epochs?

model.evaluate(x_test, y_test, verbose=2)

We can see the progression of the loss and accuracy rate in each epoch, but consider cases where we have many more epochs. It would be nice to plot it in a graph. So let's try:

# In the terminal install Matplotlib: conda install matplotlib

# With your other imports add this:
import matplotlib.pyplot as plt

# Alter the previous model.fit method to this:
history = model.fit(x_train, y_train, epochs=5)

model.evaluate(x_test, y_test, verbose=2)

# Plot the accuracy and loss
plt.plot(history.history['accuracy'])
plt.plot(history.history['loss'])
plt.title('model accuracy')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

We see the improvement in

Now that we're satisfied with our accuracy and loss, let's save our model. Create a directory in the root of the project called model and do the following modification to the code:

# Remove model.evaluate(x_test, y_test, verbose=2)
# with:
model.save('model/cv_model')

Execute the code again and you'll see our model in model/cv_model.

Testing our model

I wanted to test how well my model would evaluate my own handwritten digits, so I used GIMP to draw some 28x28 images. I used a black background with white text:

Handwritten digit of number 2

Let's create a new file that uses our stored model and reads our own images to see how well the model predicts. I called it cv.py (short for computer-vision). Start the file by importing tensorflow and loading in the model we saved to disk:

import tensorflow as tf
# We need this one later
import numpy as np 

saved_model = tf.keras.models.load_model('model/cv_model')

# This should give the same result as the summary above
saved_model.summary()

Now we need to load in the images we want to test. I created a folder called test and placed the numbers I painted in GIMP (with a mouse, not a tablet!). Feel free to draw your own images, but create them in 28x28 pixels. Keras provides an easy way to load in our image and convert them to our desired size (28x28 pixels, incase it's not already in the correct size) and color_mode.

# I'm just numbering it based on what I drew to make it easier for
# me to see which image this is.
image_2 = tf.keras.preprocessing.image.load_image('test/2.jpg', color_mode='grayscale', target_size=(28, 28))
# We want a flattened 1D vector where the values are normalized.
image_2 = tf.keras.preprocessing.image.img_to_array(image_2) / 255.0
# Expand dimension // note to self.. need to fix this?
image_2 = np.expand_dims(image_2, axis=0)

Now that we've loaded our image and processed our data we can attempt to predict the number.

predictions = saved_model.predict(image_2)

The prediction we get is how certain the model is that the image is one of the categories (digits). So let's just create a helper function that finds which category the model thinks is most similar to the image.

def findPredictedValue(list):
    max_value = 0
    index = 0
    for i, j in enumerate(list):
        if (j > max_value):
            max_value = j
            index = i

    return index
    
    
print(findPredictedValue(predictions[0]))

Testing our model in the browser

We have created a model that is able to predict digits and we've also tried to use our saved model and test it on some images we created in GIMP. We can also use this model on a web application so that we can let users draw digits and do predictions on the drawings. I'm not going to go into detail of that process now (I'll maybe save it for later), but you can test it here.

A screenshot where I try to draw the digit 4 in the web application

You will probably see that the predictions do not necessarily work so well with "real life" examples, but keep in mind this is just a simple neural network and there are still loads of improvements that can be done, e.g. rotating and skewing images, flipping them and such.

In this example, we created a super simple neural network that produces moderately good test results and decently on our "real life" web test, but the images are hardly complex. A more common way to model your neural network with regards to computer vision is using a convolutional neural networks. I'm planning on making a new tutorial about CNNs in the future. Stay tuned!