Learn how to train a simple CNN in TensorFlow and how to convert it to ONNX or TensorFlow.js for deployment.
July 26, 2024
05m Read
By: Abhilaksh Singh Reen
Today, we'll be creating a Convolutional Neural Network in TensorFlow that can classify handwritten digits. It's nothing fancy, but we'll be exporting our model to ONNX and TensorFlow.js and use it down the line in multiple other deployment tutorials.
In our codebase, we'll have files for loading the data, defining the model, testing, and training (plus callbacks), configuration, and conversions to ONNX and TFJS. The directory structure of our project will look something like the following:
│ config.yaml
│ requirements.txt
│
├───data
├───models
└───src
│ callbacks.py
│ config.py
│ dataset.py
│ model.py
│ test.py
│ train.py
│ __init__.py
│
└───converters
converter_onnx.py
converter_tfjs.py
Inside the project directory, we'll create a new file called config.yaml to store our training configuration. Right now, we'll just store the number of epochs.
training_num_epochs: 10
Next, we'll create a folder called src and inside it create a file called config.py that will load the YAML config into Python and also contain some directory paths that are needed during training and testing.
from os.path import dirname, join as path_join
from yaml import FullLoader as yaml_FullLoader, load as yaml_load
config_file_path = path_join(dirname(dirname(__file__)), "config.yaml")
with open(config_file_path, "r") as model_params_file:
config = yaml_load(model_params_file, Loader=yaml_FullLoader)
models_dir = path_join(dirname(dirname(__file__)), "models")
data_dir = path_join(dirname(dirname(__file__)), "data")
In the src folder, we create a new file called dataset.py.
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import normalize
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = normalize(x_train, axis=1)
x_test = normalize(x_test, axis=1)
# x_train = x_train / 255
# x_test = x_test / 255
data_loaders = {
"train": [x_train, y_train],
"test": [x_test, y_test],
}
Note that instead of using tensorflow.keras.utils.normalize, we can also normalize our images by dividing the pixel values by 255. This will convert values from 0-255 to 0-1. The downside of using tensorflow.keras.utils.normalize is that during testing, we would need to have TensorFlow as a dependency to preprocess our input images in the same way as we did during training. So, alternatively, we can ourselves divide the pixel values by 255 to perform the normalization. tensorflow.keras.utils.normalize is definitely more robust. Later on, we'll see how we can create a custom implementation of the function.
Inside the src directory, we create the model.py file that defines our TF model.
from keras import optimizers
from keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense
from keras.models import Sequential
class CNN(Sequential):
def __init__(self):
super().__init__()
self.add(Conv2D(10, kernel_size=5, input_shape=(28, 28, 1)))
self.add(MaxPooling2D(pool_size=(2, 2)))
self.add(Conv2D(20, kernel_size=5))
self.add(MaxPooling2D(pool_size=(2, 2)))
self.add(Dropout(0.5))
self.add(Flatten())
self.add(Dense(50, activation='relu'))
self.add(Dropout(0.5))
self.add(Dense(10, activation='softmax'))
self.optimizer = optimizers.Adam(learning_rate=0.001)
self.compile(optimizer=self.optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
Defining the model as a custom class (like CNN) may result in an Unknown Layer error when we load the model later on for testing, or when we load the model in TensorFlow JS. Alternatively, the model can be defined like this:
model = Sequential()
model.add(Conv2D(10, kernel_size=5, input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(20, kernel_size=5))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(50, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
optimizer = optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
In Keras, while training our model, we can pass callbacks to the model.fit function. A callback is an object that can define functions that get called on certain events during the training process.
Inside the src directory, we create a file called callbacks.py.
from os.path import join as path_join
from keras import callbacks
class SaveModelPerEpochCallback(callbacks.Callback):
def __init__(self, models_save_dir):
super().__init__()
self.models_save_dir = models_save_dir
def on_epoch_end(self, epoch, logs=None):
model_save_file_path = path_join(self.models_save_dir, f"epoch-{epoch}.h5")
self.model.save_weights(model_save_file_path, save_format="h5")
Here, we've defined a class that inherits from keras.callbacks.Callback and a function inside it that will be called at the end of every epoch. In this function, we just save the model weights.
In the same directory, create a file called train.py.
from datetime import datetime
from os import makedirs
from os.path import join as path_join
from .callbacks import SaveModelPerEpochCallback
from .config import config, models_dir
from .dataset import data_loaders
from .model import CNN
def train(models_save_dir, num_epochs):
model = CNN()
x_train, y_train = data_loaders['train']
save_model_per_epoch_callback = SaveModelPerEpochCallback(models_save_dir)
model.fit(x_train, y_train, epochs=num_epochs, callbacks=[save_model_per_epoch_callback])
if __name__ == "__main__":
training_id = "tensorflow---" + datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
models_save_dir = path_join(models_dir, training_id)
makedirs(models_save_dir)
train(models_save_dir, config['training_num_epochs'])
Training in Keras is much simpler than training in PyTorch. We have to just call the model.fit function with our training data and labels, the number of epochs, and optionally, the callbacks.
Inside the src folder, we create one more file called test.py. In this file, we first test on our test set, which has been loaded from our data loader. After that, we test by loading custom images and labels from the data directory. Inside the data directory, create a new folder called test_images and place some 28x28 handwritten digit images inside it. The images should have a black background and a white foreground. In the data directory, create a file called test_images_labels.json to store the ground truth labels corresponding to the test images. Here's an example of the file data/test_images_labels.json.
{
"1.png": 1,
"2.png": 7,
"3.png": 2,
"4.png": 9,
"5.png": 8,
"6.png": 5,
"7.png": 1,
"8.png": 7,
"9.png": 1,
"10.png": 7,
"11.png": 7,
"12.png": 0,
"13.png": 5,
"14.png": 3,
"15.png": 2,
"16.png": 1,
"17.png": 0,
"18.png": 8,
"19.png": 7,
"20.png": 4
}
And here's the entire src/test.py.
from json import load as json_load
from os import listdir
from os.path import join as path_join
import cv2
import numpy as np
from .config import models_dir, data_dir
from .dataset import data_loaders
from .model import CNN
from tensorflow.keras.utils import normalize
MODEL_WEIGHTS_FILE_PATH = path_join(models_dir, "training_id", "epoch-epoch_number")
if __name__ == "__main__":
model = CNN()
model.load_weights(MODEL_WEIGHTS_FILE_PATH)
### Test on MNIST Test Data
x_test, y_test = data_loaders['test']
loss, accuracy = model.evaluate(x_test, y_test)
print("Testing on MNIST Test Set")
print(f"Loss: {loss}, Accuracy: {accuracy}")
print()
### Test on Custom Images
# Load Images
test_images_dir = path_join(data_dir, "test_images")
test_images_labels_file_path = path_join(data_dir, "test_images_labels.json")
with open(test_images_labels_file_path, 'r') as test_images_labels_file:
test_images_labels = json_load(test_images_labels_file)
test_images = []
test_labels = []
for image_name in listdir(test_images_dir):
image_path = path_join(test_images_dir, image_name)
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
test_images.append(image)
test_labels.append(test_images_labels[image_name])
# Preprocess
for i in range(len(test_images)):
test_images[i] = np.array([test_images[i]])
test_images[i] = normalize(test_images[i], axis=1)
# Predict
num_correct = 0
for image, label in zip(test_images, test_labels):
prediction = model.predict(image, verbose=None)
predicted_label = int(np.argmax(prediction))
# print(f"{predicted_label} : {label}")
num_correct += predicted_label == label
accuracy = num_correct / len(test_images)
print(f"Correct: {num_correct} / {len(test_images)}, Accuracy: {accuracy}")
In the src directory, create another directory called converters, and in it create a file called converter_onnx.py.
from os.path import dirname, join as path_join
import onnx
import tf2onnx
from ..config import models_dir
from ..model import CNN
MODEL_WEIGHTS_FILE_PATH = path_join(models_dir, "training_id", "epoch-epoch_number")
ONNX_MODEL_FILE_PATH = path_join(dirname(MODEL_WEIGHTS_FILE_PATH), "model.onnx")
if __name__ == "__main__":
keras_model = CNN()
keras_model.load_weights(MODEL_WEIGHTS_FILE_PATH)
onnx_model, _ = tf2onnx.convert.from_keras(keras_model)
onnx.save(onnx_model, ONNX_MODEL_FILE_PATH)
print(f"ONNX model saved at: {ONNX_MODEL_FILE_PATH}")
We initialize our model and load its weights from the disk. The Keras to ONNX conversion is handled by the tf2onnx package. To save the onnx model to the disk, we use onnx.save.
Create a new file at src/converters/converter_tfjs.py.
from os.path import dirname, join as path_join
from tensorflowjs import converters
from ..config import models_dir
from ..model import CNN
MODEL_WEIGHTS_FILE_PATH = path_join(models_dir, "tensorflow---2024-04-21-22-04-45", "epoch-9.h5")
TFJS_MODEL_FILE_PATH = path_join(dirname(MODEL_WEIGHTS_FILE_PATH), "model-tfjs")
if __name__ == "__main__":
keras_model = CNN()
keras_model.load_weights(MODEL_WEIGHTS_FILE_PATH)
converters.save_keras_model(keras_model, TFJS_MODEL_FILE_PATH)
print(f"TensorFlow JS model saved at: {TFJS_MODEL_FILE_PATH}")
tensorflowjs.converters has the save_keras_model function that can simply convert the TensorFlow model to TensorFlow JS.
Today we've learned how to make a CNN in TensorFlow (Keras) that can recognize handwritten digits (nothing new) and converter our model to ONNX and TensorFlow JS (still nothing new). This Post intends to serve as a baseline for future Posts in which we'll be deploying our ONNX and TFJS models - both on the Backend as well as the Frontend.
See you next time :)