A Mini Model for Classifying Digits

To practice with creating classification models we simplified an example from the TensorFlow.js tutorials and created a mini version of the digit classifier that could be extended later to classify images of domino tiles, based on the dots drawn on them.
We reduced the model to training and classifying only three digits 0, 1, and 2. This simplification leads to a model that requires less data and can be a stepping stone towards building other models. We have experimented with the layers architecture and reduced the number of nodes while maintaining the model prediction accuracy.

Use the "Evaluate" button to test how the model classifies a new image from the validation set and the "Draw New" followed by using the mouse to draw a digit on the top canvas and then the "Save" button to have the model try to recognize it.

Image of input image of 2 in grayscale. Prediction image of 2 in green.
The MNIST (Modified National Institute of Standards and Technology) data consists of 60,000 training images and 10,000 test images. Each image is a 28 x 28 (784 pixels) handwritten digit.

Data format: The tutorial example that we extended had the training data preprocessed and saved as a data.js file. The images are converted to grayscale and then a single value is kept for the r,g,b channels which is further normalized by dividing by 255. Thus, an array of 784 (28 x 28) values between 0 and 1 is saved for every image in the training set. Using training data in this format is easy to handle in p5.js since uploading and downloading multiple files is a challenge in the browser. For the classification of the images from the drawing app we convert the images to an array of the same type as used in training.

An advantage of this format is keeping data in a single file that can be uploaded to p5.js. File size can be up to 5 MB. With this format, data is treated as code and we don't have to read it with functions like loadImage(), as long as data.js provides the inputs array along with the corresponding labels in the outputs array.

For a flavor of how tensors are used in the model, we list below the code for the outputs tensor which takes our outputs arrays from the data.js file which contains the corresponding labels of 0, 1, and 2, and turns them into oneHot encodings that are arrays of mostly zeros, with a single value of 1 that allows a mapping to the original labels. A sample of oneHot encodings is: [1,0,0], [0,1,0], [0,0,1].

const OUTPUTS_TENSOR = tf.oneHot(tf.tensor1d(OUTPUTS, 'int32'), 3);

Extensions

The most promising direction for building and training models in the browser using small data sets seems to be using transfer learning and building on top of an existing base model. Building a teachable machine on top of MobileNet is discussed in the tutorial video 5.3: Using layers models for transfer learning. Our code for recognizing shapes uses features from a base model as input to a new classification model.