Serving Multiple TensorFlow Models

Part 3: Setting up Google’s TensorFlow serving application and hosting multiple models.

8 min readJan 19, 2021

This post will be covering the process of setting up TensorFlow serving and exposing the two models that were build and trained in the previous post. TensorFlow serving is a system for managing machine learning models and exposing them to consumers via a standardized API. This post is part of the TensorFlow + Docker MNIST Classifier series.

If you are not familiar with docker I highly recommend going through the official getting started tutorial before implementing any of the code below.
For all of my API testing I will be using the postman application. You can use your own testing tool or download postman here.

One of the most practical ways of setting up TensorFlow is via Google’s pre-built docker container and this is the approach that will be taken in this post.

Set up the basic docker image

The first step is to ensure we have a docker serving image working correctly on our machine using one of the out of the box testing models. Make sure you have docker installed before running the below scripts in your command line.

If you are using Docker on windows you might need to share your Drives. This can be done by navigating to Docker settings > Shared Drives and making sure the drives you are working with are checked.

download the docker image and clone the repository
$ docker pull tensorflow/serving
$ git clone https://github.com/tensorflow/serving

Linux — launching the container

#map the path to the test models 
$ TESTDATA="$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata" 
$ docker run --rm -p 8501:8501 \ --name serving \  --mount type=bind,source=$TESTDATA/saved_model_half_plus_two_cpu,target=/models/half_plus_two \ -e MODEL_NAME=half_plus_two

Windows — launching the container

#map the path to the test models 
$ set TESTDATA=%cd%/serving/tensorflow_serving/servables/tensorflow/testdata 
$ docker run --rm -p 8501:8501 --name serving --mount type=bind,source=%TESTDATA%/saved_model_half_plus_two_cpu,target=/models/half_plus_two -e MODEL_NAME=half_plus_two

The serving application should not be running in the docker instance and exposed to your network on port 8501. You should see something like this if your container has launched successfully:

Now let's check if the model is up and running and what it looks like. I will be using postman to make some requests and analyze the responses. Let’s send a sample request and see if the model works. In this case, the endpoint is localhost:8501/v1/models/<model name>:predict, where the <model name> is half_plus_two. In order for the model to serve a prediction, it requires the feature matrix to be provided in the body of the POST request. Here is the payload format in this case: {"instances": <features>} where <features> is an array containing all the samples you are looking to classify. The model we are working with has input and output dimensions of 1 so we will use {"instances": [1.0, 2.0]} it as our payload and we expect a result of 2.5 and 3.0 (x / 2 + 2).

POST endpointlocalhost:8501/v1/models/half_plus_two:predicExpected response{"instances": [1.0, 2.0]}Payload{"predictions": [2.5, 3.0]}

Submitting the request using postman we can see that the model is exposed and working exactly as expected!

Before proceeding, we want to make sure to clean up the docker container as we will be redeploying our own models in a few minutes. Run the following:

$ docker kill serving

Deploy a custom model

Next, we need to serve a model that does something more useful than the out of the box half_plus_two model. Before proceeding make sure you have a TensorFlow .pb model ready to go. In my previous post I set up an autoencoder and a classifier for processing MNIST images, you can take a look at the post or grab the source code here.

My current working directory currently contains both my serving repository and my TensorFlow project folder and looks like this:

├── tf-mnist-project
│ ├── src
│ │…….
├── serving
│ ├── tensorflow_serving
│ │…….

You might have to adjust my commands to reflect your own folder structure.

Let’s create a new serving folder in the serving repository to house our models and a model subdirectory for our first model.

$ mkdir serving\mnist
$ mkdir serving\mnist\autoencoder

You should now see the 2 newly created folders. Now let’s copy our first model into the model subdirectory.

$ cp tf-mnist-project\models\autoencoder\production\1 serving\mnist\autoencoder -r

Our folder structure should now look something like this:

├── tf-mnist-project
│ ├── src
│ │…….
├── serving
│ ├── tensorflow_serving
│ ├── mnist
│ │ ├── autoencoder
│ │ │ ├── 1
│ │ │ │ ├── assets
│ │ │ │ ├── variables
│ │ │ │ └── saved_model.pb
│ │…….

Let’s try mounting the new model and launching the docker container.

Linux — launching the container

#update the path to our new model 
$ TESTDATA="$(pwd)/serving/mnist" 
$ docker run --rm -p 8501:8501 \ --name serving \  --mount type=bind,source=$TESTDATA/autoencoder,target=/models/autoencoder \ -e MODEL_NAME=autoencoder

Windows — launching the container

#map the path to the new model 
$ set TESTDATA=%cd%/serving/mnist 
$ docker run --rm -p 8501:8501 --name serving --mount type=bind,source="%TESTDATA%"/autoencoder,target=/models/autoencoder -e MODEL_NAME=autoencoder

Let's pause here and make sure we understand what we are asking docker to do with our command because we are about to run into a problem trying to expose both of our models simultaneously.

--rm
make sure the container is automatically cleaned up on exit
-p 8501:8501
map the docker internal port 8501 to the external port 8501
--name
serving give a name to our container for easier identification and termination
--mount type=bind,source=$TESTDATA/autoencoder,target=/models/autoencoder
mounts the content of the autoencoder folder, this is required for the serving application to locate the correct model
-e MODEL_NAME=autoencoder
pass the environment variable MODEL_NAME to the serving application to help locate the correct model

Hopefully, after launching the latest docker container with the above command you see output without any errors, indicating that our custom model is up and running correctly. Again we will be confirming this with a postman request. This time the POST endpoint localhost:8501/v1/models/autoencoder:predict and the feature vector should have the dimension [1, 784] (28x28 pixels). I made a sample payload of 784 1.0 values, you can grab it below.‌

Here is the sample digit I used.

After sending a request via postman we can see that the model is up, running, and returning a [1, 784] response as expected!

We can even plot the digits to visualize the result using a function like this:

import matplotlib.pyplot as plt
import numpy as npdef compare_digits(raw, processed):
  image = np.append(raw, processed)
  image = np.array(image, dtype='float')
  pixels = image.reshape((56, 28))
  plt.imshow(pixels, cmap='gray')
  plt.show()

And should see something like this:

Again before moving on we want to make sure clean up the docker container:

$ docker kill serving

Deploy multiple custom models simultaneously

Our serving application is able to serve as one model but in a production environment, we would typically expect multiple models to be available at the same time. You will notice in our original docker run command we mounted our model folder and passed -e MODEL_NAME=<model>. Mounting multiple model folders can be done without issue but passing multiple model names cannot be done directly in this request. To bypass this challenge we can store our model information in a configuration file and provide it to the serving application.

Before creating the config file we need to add the folders for the second model and one to store the config file and copy the build classifier model:

$ mkdir serving\mnist\classifier
$ mkdir serving\mnist\config
$ cp tf-mnist-project\models\classifier\production\1 serving\mnist\classifier -r

Now let's create a models.config file in the config directory. This is what it should look like:

model_config_list: { 
  config: {
    name: "classifier",
    base_path: "/models/classifier",
    model_platform: "tensorflow"
  },
  config: {
    name: "autoencoder",
    base_path: "/models/autoencoder",
    model_platform: "tensorflow"
  },
}

The new folder structure should now look like this:

├── tf-mnist-project
│ ├── src
│ │…….
├── serving
│ ├── tensorflow_serving
│ ├── mnist
│ │ ├── autoencoder
│ │ │ ├── 1
│ │ │ │ ├── assets
│ │ │ │ ├── variables
│ │ │ │ └── saved_model.pb
│ │ ├── classifier
│ │ │ ├── 1
│ │ │ │ ├── assets
│ │ │ │ ├── variables
│ │ │ │ └── saved_model.pb
│ │ ├── config
│ │ │ ├── models.config
│ │…….

Let's launch docker again, this time mounting both model folders along with the configuration file.

Linux — launching the container

$ TF_MODEL_DIR="$(pwd)/serving/mnist" 
$ docker run \
--rm -p 8501:8501 --name serving \
--mount type=bind,source=%TF_MODEL_DIR%/classifier,target=/models/classifier \
--mount type=bind,source=%TF_MODEL_DIR%/autoencoder,target=/models/autoencoder \
--mount type=bind,source=%TF_MODEL_DIR%/config,target=/config  tensorflow/serving \
--model_config_file=/config/models.config

Windows — launching the container

$ set TF_MODEL_DIR=%cd%/serving/mnist 
$ docker run --rm -p 8501:8501 --name serving --mount type=bind,source=%TF_MODEL_DIR%/classifier,target=/models/classifier --mount type=bind,source=%TF_MODEL_DIR%/autoencoder,target=/models/autoencoder --mount type=bind,source=%TF_MODEL_DIR%/config,target=/config tensorflow/serving --model_config_file=/config/models.config

Notice we are mounting 3 folders, one for each model and one that stores our configuration file. We then tell the module to use the model.config from the mounted config folder to figure out how to map our models to the other 2 folders using --model_config_file=/config/models.config. The output should look like the below (take note that there both of our models are now up and running.

We now have two models accessible on:

localhost:8501/v1/models/autoencoder:predict
localhost:8501/v1/models/classifier:predict

If you rerun the autoencoder test from above you will see that the model is still functional, and if you forward the response to the classify endpoint you should see something like this:

As you can see the classifier model is also up and running and has classified our sample 3 digit correctly.

I converted my serving module into a personal repository, you can check it out here (if you clone my repository you will need to change some of the path values as the root folder name will no longer be serving). Now that we have our models up and running the next step is to set up a basic external application to utilize the API endpoints and demonstrate the functionality.

Here is a summary of the components involved in this project:

Introduction
The Models : git repo → tf-mnist-project
Serving Models : git repo → tf-serving-mnist-project
The User Interface : git repo → angular-mnist-project

Serving Multiple TensorFlow Models

Part 3: Setting up Google’s TensorFlow serving application and hosting multiple models.

Written by Andrew Didinchuk

No responses yet