Firstly, we are going to look at the work completed to produce GPT in a Box running on Kubernetes at our Data Centre which collocates our Immersion platform. Our servers are sitting in a GRC tank, which is filled with Thermal Management Fluid from BP Castrol, the servers are hosted in the UK Reading based Data Centre by Centresquare. UNICOM Engineering converted the Dell R750 nodes for Immersion. We have already created an Ubuntu Linux vm on our Nutanix cluster to act as a Jump Box which we will use to download the required dependencies such as Kubeconfig and run Conda to manage our virtual dependencies, run Helm package manager to manage our NVIDIA drivers and deploy Istio Ingress Gateway along with Kserve’s inference server.
In preparation we have already created a NKE or Nutanix Kubernetes Engine Cluster on our servers and this consists of 2x control plane nodes, 3x worker nodes with one of the workers using an A40 GPU and 3x etcd nodes. We have created a Nutanix File cluster as this is where we will generate the model archive and this file share needs to be mounted to our jump box. A vital component for this design framework is a mapped PV and a PVC. For those of you that aren’t used to working with Kubernetes a PV is a Persistent Volume which is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. If the containers on a node are drained or spun down the storage volume still persists. This can be likened to persistent and non-persistent desktops if you are familiar with VDI.
A PVC is a Persistent Volume claim is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources such as CPU and Memory. Although we don’t use Nutanix Objects in this platform deployment we have object buckets created which can replicate with our AWS S3 buckets if we need them to. Kubeconfig allows us to run Kubectl commands so that we can manager and administer our Kubernetes cluster. There are a number of very helpful commands in the Nutanix GPT in a Box opendocs which guides a user through the setup and installation of various platform components and python codebases.
After running through the deployment steps and processes, we have generated a Model archive from Hugging Face, the model archive is stored on our file share, and our Kubernetes cluster is mapped using our PV and PVC. We can now run our model as a live chat bot using a front end web app Streamlit.io. We are using a basic model which has been validated by Nutanix for GPT in a Box but as you can see it won’t give us the most sensible answers you are likely to see. As experts in Data Science, Data Engineering and Data Warehousing, we will aim to deliver a custom model for this platform using a Meta Llama 3 model trained on a Hugging Face Dataset
OUR PARTNERS: