ML Integrated Operations : Project

Machine Learning/Deep Learning is the most used technology in today’s era but many machine learning and deep learning models didn’t upgrade to the fullest and get remained in 2nd class. This happens due to manual working of humans like changing parameters, adding layers we can’t predict how much to do what …but machines and programs can. So to overcome this we have to automate the manual work by making our hyper-parameters and other small stuff dynamics using some operational tools.

This practice of making ML/DL model suffer in continuous integration and continuous delivery is in laymen terms known as MLops.

Today I have a task to see the same in action integrated with metrics monitoring.

Task

Create container image that’s has Python3 and Keras or numpy installed using dockerfile
When we launch this image, it should automatically starts train the model in the container.
Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins
Pull the Github repo automatically when some developers push repo to Github.
By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the software required for the cnn processing).
Train your model and predict accuracy or metrics.
If metrics accuracy is less than 80% , then tweak the machine learning model architecture.
Retrain the model or notify that the best model is being created
If container where app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left.

Dockerfile

# docker build -t image_name:tag .

We can build our dockerfile by above command.

JOB 1

This job will pull the github repo and copy it to a directory which will be mounted as a volume to our container.

I have used PollScm which enables jenkins to go and check github repo any changes have occur or not as soon as new changes takes place it will pull the changes.

In place of pollscm we can use GitHub webhooks and can use tunneling using ngrok as shown in previous blog.

JOB 2

This job launches the desired environment by looking at the model

Code under Build

if sudo grep -iE 'vgg|imagenet' /root/MLops/model/model.py
then
    echo "opening keras environmnet"
    if sudo docker ps -a | grep testkeras
    then
    sudo docker rm -r testkeras
    sudo docker run -dit --name testkeras -v /root/MLops/model/:/root/model/ keras:3
    sudo docker exec testkeras python3 /root/model/model.py
    else
    sudo docker run -dit --name testkeras -v /root/MLops/model/:/root/model/ keras:3
    sudo docker exec testkeras python3 /root/model/model.py
    fi
else
    echo "opening ML environment"
    if sudo docker ps -a | grep testml
    then
    sudo docker rm -f testml
    sudo docker run -dit --name testml -v /root/MLops/model/:/root/model/ ml
    sudo docker exec testkeras python3 /root/model/model.py
    else
    sudo docker run -dit --name testml -v /root/MLops/model/:/root/model/ ml
    sudo docker exec testkeras python3 /root/model/model.py
    fi
fi

JOB 3

This job is the ace. It predicts the accuracy and validation accuracy, if accuracy and validation accuracy is less than 80% it tweaks the model using a python script and some bash commands as shown below.

By this we are adding an additional dense layer to the model and changing hyper parameters like number of epochs, batch size, neurons and learning rate. After tweaking the model it will train the model and predict accuracy again

This will keep happening until it achieves 80% accuracy or higher and creates the best model.

Code Under Build

#!/bin/bash
a=$(sudo docker exec testkeras cat /root/model/accuracy.txt)
va=$(sudo docker exec testkeras cat /root/model/val_accuracy.txt)
echo $a
echo $va
epochs=3
batch=40
count=0
n1=300
n2=200
while [[ $a -le 0.80 || $va -le 0.80 || $count -le 20 ]]
do
    $((epochs=epochs+2))
    $((count=count+1))
    $((batch=batch-3))
    $((n1=n1+5))
    $((n2=n2+10))
    sudo docker exec testkeras python3 /root/model/hypertuner.py
    sudo sed -i '/epochs_x=/c\epochs_x='$epochs /root/MLops/model/model.py
    sudo sed -i '/batch_size_x=/c\batch_size_x='$batch /root/MLops/model/model.py
    sudo sed -i '/n1=/c\n1='$n1 /root/MLops/model/model.py
    sudo sed -i '/n2=/c\n2='$n2 /root/MLops/model/model.py
    sudo docker exec testkeras python3 /root/model/model.py
    a=$(sudo docker exec testkeras cat /root/model/accuracy.txt)
    va=$(sudo docker exec testkeras cat /root/model/val_accuracy.txt)
done

The file hypertuner.py : https://github.com/mykg/ML-Ops-project/blob/master/hypertuner.py

JOB 4

This is a monitoring job that keeps monitoring the environment if it fails due to any reason it will launch the new environment within a second with the same config as previous. And for any reason a job fails it will send an error mail to the engineer.

If you are having trouble in sending email then visit : https://www.youtube.com/watch?v=DULs4Wq4xMg

Build Pipeline View

GitHub Repo Dataset

Making Model Service Permanent

You can make your model execution permanent, every time container starts the training of model begins. We can do that in dockerfile using CMD but it works when we already have model at image building time. So we can permanent it by adding execution to bashrc.

docker exec testkeras cat python3 /root/model/model.py >> /root/.bashrc

I used python:latest image for building the dockerfile, by default it starts with python 3 interpreter so above method may not work. But if you are using some other image whose entrypoint is a bash shell then you can go for above.

We can change entrypoint of python3:latest image from python 3 interpreter to bash as I have done in first image of Dockerfile.

Metrics Monitoring

This is an addition.

Here, I am monitoring metrics using Prometheus, Grafana and MLflow.

Metrics monitoring is very important while analyzing stats as it gives very beautiful representation of data.

Prometheus

It is a famous metrics monitoring system and time series database.

To know more : https://prometheus.io/

Here, I am monitoring my localhost(rhel), docker daemon and prometheus.

Grafana

Grafana is widely used with prometheus because of it’s visuals and graph monitoring

It is monitoring my localhost ram/cpu usage, docker daemon usage and prometheus.

To know more : https://grafana.com/

MLflow

MLflow offers wide range of APIs to monitor metrics and other experiments of ML/DL model.

To know more about mlflow go to: https://github.com/mlflow/mlflow/ and https://mlflow.org/docs/latest/quickstart.html

Above image shows the accuracy metrics graph monitored by mlflow ui.

Above image has all the metrics in one place showing stats of accuracy, validation accuracy, loss, validation loss.

Conclusion

We have achieved a complete end-to-end automated CI/CD pipeline in CNN/ML in which model is trained through transfer learning on pre-trained weights of VGG16.

In this model, all one has to do just push the changes in our model and rest of the work that includes training the model, predicting the accuracy, and tuning the hyper parameters if the accuracy is found to be less than 80%. This automation system automatically makes the required changes in the model such that its accuracy improves. There is absolutely no human involvement needed !

Thank you for coming this far and comment down your views