In this post I am going to learn how to achieve TensorFlow model serving using KServe. Kserve is part of the Kubeflow ecosystem, which provides tools and frameworks to streamline the deployment, orchestration, and management of ML models in production environments.
To setup KServe it is required to have a Kubernetes cluster setup. So we will be using Minikube for Kubernetes cluster setup locally. Minikube is a tool that allows you to run a single-node Kubernetes cluster on your local machine. It is designed to be a lightweight and easy-to-use solution for developers who want to experiment with Kubernetes, develop applications, or test deployments in a local environment without needing access to a full-scale Kubernetes cluster.
Table of Contents:
- What is Minikube
- What is KServe
- Setup single node Kubernetes cluster using minikube
- Install KServe on this cluster
- Create the inference serving using kubectl
- configure Ingress Host and Port
- Port Forwarding for testing
- Run the Prediction
What is Minikube
Minikube is a tool that allows you to run a single-node Kubernetes cluster on your local machine. It is designed to be a lightweight and easy-to-use solution for developers who want to experiment with Kubernetes, develop applications, or test deployments in a local environment without needing access to a full-scale Kubernetes cluster.
Here are some key features and components of Minikube:
- Local Kubernetes Cluster: Minikube creates a single-node Kubernetes cluster locally on your machine, providing a simple way to get started with Kubernetes development and testing.
- Cross-Platform Support: Minikube is compatible with various operating systems, including macOS, Linux, and Windows, making it accessible to a wide range of users.
- Support for Kubernetes Features: Minikube supports many of the core features of Kubernetes, including DNS, Dashboard, CNI (Container Network Interface), Ingress, ConfigMaps, Secrets, and more.
- Resource Management: It allows you to configure the amount of CPU and memory allocated to the Minikube virtual machine, helping you manage your local machine’s resources effectively.
- Multiple Runtimes: Minikube supports different container runtimes, including Docker, containerd, and CRI-O, allowing you to choose the runtime that best suits your needs.
- Add-ons: Minikube comes with a variety of add-ons that you can enable to extend its functionality. These add-ons include metrics-server for resource monitoring, ingress controllers, and various storage provisioners.
- Easy Start and Stop: Minikube provides simple commands to start, stop, and delete the Kubernetes cluster, making it convenient to manage the lifecycle of your local development environment.
- Networking: It sets up the necessary networking components to allow communication between the local machine and the Kubernetes cluster, facilitating development and testing.
Minikube is particularly useful for:
- Developers who want to learn and experiment with Kubernetes without needing a cloud-based cluster.
- Application development and testing in a controlled local environment.
- Demonstrations and educational purposes to teach Kubernetes concepts.
- Building and testing CI/CD pipelines locally before deploying to a production cluster.
Overall, Minikube offers a straightforward and accessible way to run Kubernetes locally, making it a valuable tool for developers and learners in the Kubernetes ecosystem.
We will see the minikube installation steps in the below sections after understanding what is KServe.
What is KServe
KServe is an open-source project designed for serving machine learning (ML) models on Kubernetes. It is part of the Kubeflow ecosystem, which provides tools and frameworks to streamline the deployment, orchestration, and management of ML models in production environments.
Here are some key features and components of KServe:
- Model Serving: KServe enables the deployment of machine learning models for real-time inferencing. It supports multiple frameworks such as TensorFlow, PyTorch, SKLearn, XGBoost, and ONNX.
- Multi-Model Serving: KServe allows serving multiple models on a single server, which can optimize resource utilization and reduce costs.
- Predictive Autoscaling: It can automatically scale model instances based on the incoming request load, ensuring high availability and efficient resource usage.
- Canary Rollouts: KServe supports canary rollouts, which allow for incremental deployment of model versions. This facilitates A/B testing and safe, gradual updates of models in production.
- Logging and Monitoring: It integrates with tools like Prometheus and Grafana for monitoring model performance and logging infrastructure to track predictions and errors.
- Inference Graphs: KServe supports creating inference graphs, enabling complex inferencing workflows where models can be composed together, such as chaining multiple models or adding pre/post-processing steps.
- Serverless Deployment: It leverages Kubernetes and Knative to provide serverless capabilities, which means that resources are only consumed when the model is being actively used for inference, leading to cost savings.
Overall, KServe aims to provide a scalable, flexible, and easy-to-use solution for deploying and managing ML models in production environments, leveraging the robust orchestration capabilities of Kubernetes.
Kubernetes Cluster Setup using Minikube
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube && rm minikube-linux-amd64
minikube start
kubectl get pods -A
Reference: https://minikube.sigs.k8s.io/docs/start/
Install KServe on this cluster
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.12/hack/quick_install.sh" | bash
Create the inference serving using kubectl
Apply the tensorflow.yaml to create the InferenceService, by default it exposes a HTTP/REST endpoint
Create a file tensoflow.yaml with the below content
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "flower-sample"
spec:
predictor:
model:
modelFormat:
name: tensorflow
storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
kubectl apply -f tensorflow.yaml
Wait for the InferenceService to be in ready state. Run below command to check the status
kubectl get isvc flower-sample
Configure Ingress Host and Port
Execute the following command to determine if your kubernetes cluster is running in an environment that supports external load balancers
kubectl get svc istio-ingressgateway -n istio-system
If the EXTERNAL-IP value is set, your environment has an external load balancer that you can use for the ingress gateway.
export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
Alternatively you can do Port Forward for testing purposes.
INGRESS_GATEWAY_SERVICE=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}')
kubectl port-forward --namespace istio-system svc/${INGRESS_GATEWAY_SERVICE} 8080:80
once you run the port forwarding command your terminal will be blocked so open a new terminal and export the ingress host and port:
export INGRESS_HOST=localhost
export INGRESS_PORT=8080
To run the prediction you will need the input file containing the byte array of the image which to be classified in different flower categories. you can download this file using the below link.
Run a prediction
MODEL_NAME=flower-sample
INPUT_PATH=@./input.json
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d $INPUT_PATH
in the Below screenshot you can see the prediction output.

Hope you enjoyed the post. Thank you very much for reading.
If you want to connect with me, then below are the various ways to stay connected professionally.
LinkedIn: https://www.linkedin.com/in/ashutoshtripathiai/
Instagram: https://www.instagram.com/ashutoshtripathi_ai/
Twitter: https://twitter.com/ashutosh_ai
Website: https://ashutoshtripathi.com
If you want to message me directly, then connect with me on LinkedIn and send a direct message.
If you want to discuss any topic around ML or have queries related to job switch, please feel free to schedule a 30 min call with me, I am available on topmate: https://topmate.io/ashutosh_ai