Image segmentation using Deep Learning for e-commerce applications

Project Setup

Goal of the Project

The Problem: One of the major factor that holds people back while shopping online for furniture and home decor products is that they won't be able to touch, see in person and feel how the product would fit into their home. The user has to go solely by the images of products posted on the ecommerce websites. This makes the user more hesitant while shopping online for such products.

The Solution: The project aims to address this issue by providing an option for the user to segment the product from the image and then visualise how it fits in their own home. This is done by using an Image Segmentation Model which can separate the predefined set of product categories from the rest of the image, which can then be placed on a video stream of the user's room for visualisation.

Challenges and Constraints

Major Challenges:

Limited Data Availability: One of the major challenges that most enterprises face while planning to adopt Machine Learning into their system is the lack of availability of massive datasets necessary to train powerful Models. While in this project, although more data could have been acquired by scraping more product images, this is avoided in order to be constrained to work with a very limited dataset. Will this small dataset be enough to get satisfactory performance is one of the questions to be answered through this project.
Non-resuability of pre-trained models: While there are already many pre-trained Image Segmentation Models which can be used for this purpose, there are two major challenges,
- Custom classes: Most pre-trained models would be trained on standard datasets such as MS-COCO, PASCAL-VOC whereas for our application there is a need to segment product categories which are not part of those datasets. As a result, a custom model will need to be built on top of the pre-trained model and the final layers of this model will then have to be trained using custom dataset.
- Model evaluation time: The pre-trained models are often massive in size since they have been trained to classify a wide variety of objects. Due to this, such models are slow in nature. However one of the challenges in this application is to have really fast response time such that user can view the segmented product in real-time.
As a result of this, such pre-trained models cannot be used for this application, instead a custom model which can perform image segmentation on custom product categories with very small size and evaluation time will need to be developed which forms the goal of this project.

End Result

The end result of this project is shown below. The screen capture (from the app deployed on AWS) shows an example use case of the user selecting a product on a typical ecommerce website, the Image segmentation model returning the segmented product after which the user can then adjust the size and position of the product placed on the video stream of user's room.

Project Considerations

Before deciding to use Machine Learning in any application, there are a number of factors to be considered such as what business purpose does the project serve, project constraints, performance constraints, how to evaluate the system performance. The following block diagram describes all the major considerations.

Data Considerations and Pipeline

Data Considerations

One of the major challenges in a Machine Learning project is to handle the various parts of the Data Pipeline such as Data Collection, Data Storage, Data Pre-processing and Representation, Data Privacy, Bias in Data. It is important to handle these aspects of Data pipeline and the following block-diagram answers all the questions regarding handling data.

Data Pipeline

After scraping for product images and obtaining reference masks using pre-trained model, the data is stored in a JSON file (paths to images and masks). Pre-processing is then performed on this dataset after which the dataset is divided into 3 sets,

Data	Purpose	Number of Images
Training	To fit Model	285
Validation	To tune hyperparameters	95
Test	To evaluate model performance	95

As can be observed, this is a very small dataset which is to be used to fine-tune the final layers of a pre-trained Image Segmentation Model.

Modeling: Fitting and Training Deep learning models

Model Evaluation Metrics

The two metrics often used for Image segmentation tasks are

Pixel wise Classification Accuracy: In this metric, each pixel is regarded as belonging to a class (background or one of the product categories),
Intersection Over Union (IoU): As described in Tensorflow documentation, IoU is defined as, IoU = true_positive / (true_positive + false_positive + false_negative).

In this project, during fitting the model, both the metrics are tracked to see the progress in model training. The second metric, IoU is used to compare model performance on Test dataset.

Baseline Models for Model Comparison

The following are some of the popular Image Segmentation Models. There are two constraints due to which these models cannot be used in this application.

These models would have been trained on standard datasets and are not capable of performing segmentation on custom product categories used in this application.
The Model size and evaluation time do not fit the project specifications. In order to be used in a ecommerce website, the model size needs to be small and the evaluation time needs to be very low which would enable an user to view the segmented product instantly.

Model	Size	Evaluation Time	IoU
Detectron 2	178 MB	0.10 seconds	0.89
Mask RCNN	170 MB	6.29 seconds	0.66

The goal of this project is to build such a custom model which can serve two purposes sepcifically,

Able to perform Image segmentation on custom product categories
The Model should be small in size with very low evaluation time per image.

The following sections will describe the building, training and evaluation of such a custom model.

Transfer Learning: Fine-tuning pre-trained model

Although a new custom model is necessary for this application, there is no need for this custom model to be built from scratch. Instead a pre-trained model (trained on massive image datasets such as ImageNet) can be used as a base for the custom model. On top of this base model, additional layers can be added. During training, the weights and bias of the base model are fixed and not changed whereas the final layers which are added will be updated. This process is referred to as Transfer Learning where in only the last few layers are trained. After which, all the layers of the custom model can be trained with a small learning rate. This is referred to as Fine-tuning a pre-trained model.

The code block to build such a model is shown here,

Image Segmentation Model: Custom Layers on top of MobileNet Base Model

The Custom Model is built with the following components,

Base Model: MobileNet V2 pre-trained on ImageNet
Custom Layers: Decoder layers of U-Net

The input image is first passed through downsampling layers of MobileNet and then upsampled (with skip connections) through the decoder layer stack. All the layers of this Custom Model can be seen in the following figure presented here.

Model Training

Following are the Model parameters,

Model: U-Net (with MobileNet V2 as base model)
Optimiser: Adam
Loss: Sparse Categorical Cross entropy
Metrics: Accuracy (Pixel wise multi class classification)

The model is trained with the above parameters. The improvements in accuracy and loss as the model trains is shown in the following figures.

Learning Curves Observation: Both training and validation accuracy improves over the early epochs and then saturates. However a reasonably significant gap is observed between the two curves even after many epochs. This suggests that perhaps there is still some overfitting in the model and in order to

Hyperparameter tuning

So far, the model has been trained on one given set of parameters. However in order to find out the best set of parameters, the model will need to be evaluated on several different combinations of parameters. This process is referred to as Hyper-parameter tuning. In order to avoid using the Test Data until final evaluation, a separate split of data known as Validation Dataset will be used in order to evaluate model with different sets of hyper-parameters.

Tensorflow makes it easy to perform Hyper-parameter tuning using Tensorboard. Different sets of parameters are pre-defined and the results are logged to Tensorboard, the code for which is shared here,

In this experiment, the performance of the Model is tested with various options for optimiser: Adam, SGD and RMSProp, the results for which on the Tensorboard is presented here,

Tensorflow Profiler

During training, one of the aspects to be monitored is the amount of time it takes to compute the various stages of the data pipeline. Inspecting this will help in identifying the most time intensive section of the pipeline which can then be addressed in order to reduce the total training time.

Tensorflow's Profiler makes it easier to track various metrics such as Time Taken on Host, Time Taken on Device, Time Spent for various operations on GPU, Memory Profile and Performance Summary, the stats for which are presented in the following figures below.

Tensorflow Input Data Pipeline

In order to reduce the time spent in Input Data Pipeline and improve the memory footprint, a number of steps can be adopted as suggested in Tensorflow's Input Data Pipeline Guide. The following operations are used in this project,

Prefetching: Prefetching overlaps the preprocessing and model execution of a training step. While the model is executing training step n, the input pipeline is reading the data for step n+1. Doing so reduces the step time to the maximum (as opposed to the sum) of the training and the time it takes to extract the data.
Parallel mapping: When preparing data, input elements may need to be pre-processed. To this end, the map transformation, which applies a user-defined function to each element of the input dataset is used. Because input elements are independent of one another, the pre-processing can be parallelized across multiple CPU cores.
Caching: Caching a dataset, either in memory or on local storage will save some operations (like file opening and data reading) from being executed during each epoch.
Batching: Invoking a user-defined function passed into the map transformation has overhead related to scheduling and executing the user-defined function. By applying the batch transformation before the map transformation the user-defined function can operate over a batch of inputs at once.

The timeline for various Input Data Pipeline operations as seen in Tensorflow's Trace Viewer is presented here,

Logging data to Tensorboard

In addition to the Loss and Accuracy metrics, other custom data can be logged to Tensorboard during Model training. Since this project uses Intersection Over Union (IoU) as one of the performance metrics, the IoU value on the validation dataset was logged to Tensorboard at the end of each epoch, the code for which is presented below,

In the following figure, we can observe the improvements in IoU with more epochs during Model training.

IoU during Model Training

Model learning

The following figures shows how the model improves on its predicted segmentation mask as the training progresses. The first row shows the Input Image and the Reference Mask. The following rows shows the Predicted Masks during various epochs.

Input Image

True Mask

Epoch 0

Epoch 2

Epoch 5

Epoch 10

Epoch 15

Epoch 20

Epoch 25

Epoch 30

Epoch 35

Epoch 40

Epoch 45

Epoch 50

Model Comparison and Selection

The following figures compares the performance of our Custom Model against the Baseline Models. It can be observed that although there is a drop in IoU for the Custom Model, the evaluation time is less than 1/10th of the best time for any of the Baseline Models. The drop in IoU can be improved by training the Custom Model with a bigger dataset.

Let us revisit the table of our Baseline Models and now compare it with our Custom Model.

Model	Size	Evaluation Time	IoU
Detectron 2	178 MB	0.10 seconds	0.89
Mask RCNN	170 MB	6.29 seconds	0.66
Custom Model	64 MB	0.01 seconds	0.59

As we can observe, the trained Custom Model has served the purposes required for this application,

Custom Product Categories: The Custom Model can segment all the product categories used in this application.
Model Size: The Custom Model is less than half the size of the other Baseline models which makes it easier to be deployed in production pipelines.
Model Evaluation Time: The Custom Model takes about 1/10th of the best time for any of the Baseline models. Due to the model being used in real time predictions, the Model Evaluation Time was one of the major challenges before starting this project, and with this performance, the project has successfully achieved its goal.
Acceptable IoU: While the IoU is comparably lower than the Baseline Models, this can be improved by training the Custom Model with a bigger dataset.

Post Training Model Quantisation

Before using the trained model in production, it will be useful to further reduce the Model Size using Model Quantisation. While various techniques are used, this project uses Float 16 Quantisation which reduce the size of a floating point model by quantizing the weights to float16 (IEEE Standard). The code snippet to achieve this is shared below,

As can be noticed in the following figure, this operation reduces the model size in half with minimal loss in accuracy.

Saving Model

The trained and quantised Model will then have to be saved in order to be used in Production. As shown in the following code block, the Model can be saved either in .h5 Format or Tensorflow's SaveModel Format.

Model Predictions

Finally, the following figures present few sample images of various product categories from Test Dataset and the corresponding Segmented Products (based on Segmentation Masks) produced by the Model.

It can be observed that there is still lots of room for improvements in predictions, one possible solution would be to use a bigger dataset to train the model since the dataset used in this project was very small in size with limited number of images for each product categories.

Deployment, Serving and Production

You need to think of what feedback you'd like to get from your users, whether to allow users to suggest better predictions, and from user reactions, how to defer whether your model does a good job. You should also think about how to run inferencing: on the user device or on the server and the tradeoffs between them.

Model deployment using FLASK Application

A FLASK Webapp is developed in order to serve the Model Predictions and showcase the capabilites of the project. The following code block shows how the model is used to get inference for a given input image.

Serving Model Predictions: REST API as Web Service

The Model predictions can also be served as a Web Service by using REST API. The following code snippet shows how this can be accomplished. The model output is returned as a JSON object.

The following figure shows how the Model predictions can be obtained using the above REST API.

Model in Production: FLASK, Docker, AWS

The production pipeline consists of the following components,

FLASK Webapp: Webapp and REST API to serve Model Predictions
Docker: Containersied FLASK Webapp which can then be deployed in any environment
AWS: CI/CD Pipeline

ECR Repository: The Docker Image is stored in this repository. Any changes to this image will trigger changes in the rest of the pipeline and the updates to the image will then be deployed to the Web Application.
CodeCommit : The pipeline is configured to use a source location where the following two files are stored,
- Amazon ECS Task Definition file: The task definition file lists Docker image name, container name, Amazon ECS service name, and load balancer configuration.
- CodeDeploy AppSpec file: This specifies the name of the Amazon ECS task definition file, the name of the updated application's container, and the container port where CodeDeploy reroutes production traffic.
CodeDeploy: Used during deployment to reference the correct deployment group, target groups, listeners and traffic rerouting behaviour. CodeDeploy uses a listener to reroute traffic to the port of the updated container specified in the AppSpec file
- ECS Cluster: Cluster where CodeDeploy routes traffic during deployment
- Load Balancer: The load balancer uses a VPC with two public subnets in different Availability Zones.

Conclusion: FAQ, Challenges and Learnings

Why this Project ?
- The Problem: One of the major factor that holds people back while shopping online for furniture and home decor products is that they won't be able to touch, see in person and feel how the product would fit into their home. The user has to go solely by the images of products posted on the ecommerce websites. This makes the user more hesitant while shopping online for such products.
- The Solution: The project aims to address this issue by providing an option for the user to segment the product from the image and then visualise how it fits in their own home. This is done by using an Image Segmentation Model which can separate the predefined set of product categories from the rest of the image, which can then be placed on a video stream of the user's room for visualisation.
What were the major project constraints ?
- Limited Data Availability: One of the major challenges that most enterprises face while planning to adopt Machine Learning into their system is the lack of availability of massive datasets necessary to train powerful Models. While in this project, although more data could have been acquired by scraping more product images, this is avoided in order to be constrained to work with a very limited dataset. Will this small dataset be enough to get satisfactory performance is one of the questions to be answered through this project.
- Non-resuability of pre-trained models: While there are already many pre-trained Image Segmentation Models which can be used for this purpose, there are two major challenges,
  - Custom classes: Most pre-trained models would be trained on standard datasets such as MS-COCO, PASCAL-VOC whereas for our application there is a need to segment product categories which are not part of those datasets. As a result, a custom model will need to be built on top of the pre-trained model and the final layers of this model will then have to be trained using custom dataset.
  - Model evaluation time: The pre-trained models are often massive in size since they have been trained to classify a wide variety of objects. Due to this, such models are slow in nature. However one of the challenges in this application is to have really fast response time such that user can view the segmented product in real-time.
  As a result of this, such pre-trained models could not be used for this application, instead a custom model which can perform image segmentation on custom product categories with very small size and evaluation time had to be developed which formed the goal of this project.
How did you collect data and label it ?
- The Data was collected by scraping product images from ecommerce websites.
- Labels were obtained using Detectron2 Model but this model in itself could not be used in the application due to its bigger size and higher evaluation time.
What training prodecure did you follow ?
- Building Image segmentation model using MobileNet as base model
- Fine-tuning last layer of pre-trained model
- Tuning Model Hyperparameters
- Using Tensorboard to monitor metrics using Model Training
- Using Profiler to inspect, identify memory and time intensive sections of the pipeline.
Why did you use the model you ended up using ?
- Custom Product Categories: The Custom Model can segment all the product categories used in this application.
- Model Size: The Custom Model is less than half the size of the other Baseline models which makes it easier to be deployed in production pipelines.
- Model Evaluation Time: The Custom Model takes about 1/10th of the best time for any of the Baseline models. Due to the model being used in real time predictions, the Model Evaluation Time was one of the major challenges before starting this project, and with this performance, the project has successfully achieved its goal.
- Acceptable IoU: While the IoU is comparably lower than the Baseline Models, this can be improved by training the Custom Model with a bigger dataset.
What were the major challenges and learnings through this project ?
- A lot of Image Segmentation Models seem to be pre-trained on PASCAL VOC but not on MS-COCO dataset. This means that for the categories defined in MS-COCO but not in PASCAL VOC, such pre-trained models cannot be used for off the shelf inference. Instead the finaly layers of such models will need to be trained using the custom dataset of the application.
- In the Modelling stage of the project, Tensorboard offers a number of useful features to monitor the progress of training Models.
  - Profiler can be used to examine the efficiency of Input data pipeline and to determine which layers of the Model involves most time and memory intensive operations
  - Custom data can be logged to Tensorboard at the end of each epoch.
  - It is very convenient to tune Hyper-parameters of the Model using the functionalities of Tensorboard.
What further work can be done to improve ?
- More data can be collected and the Model can then be trained on a bigger dataset. This would improve image segmentation results.