How to Run Ollama on Kubernetes: A Comprehensive Guide
Tasrie IT Services
Introduction to Ollama
Ollama is a cutting-edge AI application designed for high-performance computing tasks. Running Ollama on Kubernetes ensures that you can take advantage of Kubernetes' scalability and reliability features.
Table of Contents
- Introduction to Ollama
- Prerequisites
- Understanding the Ollama YAML Configuration
- Deploying Ollama on Kubernetes
- Verifying the Deployment
- Scaling Ollama
- Conclusion
- Call to Action
Prerequisites
Before you begin, make sure you have the following:
- A running Kubernetes cluster
- kubectl command-line tool configured to interact with your cluster
- Docker registry access to pull the Ollama image
Understanding the Ollama YAML Configuration
The core of your deployment is defined in the ollama.yaml
file. Here’s a breakdown of the key sections:
- StatefulSet Configuration: This ensures that Ollama pods have persistent storage and maintain their state across restarts.
- Service Configuration: Exposes Ollama to the network, allowing you to access it.
Sample Configuration
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ollama
namespace: ai
spec:
serviceName: "ollama"
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama
ports:
- containerPort: 11434
volumeMounts:
- name: ollama-storage
mountPath: /root/.ollama
lifecycle:
postStart:
exec:
command: [ "/bin/sh", "-c", "if [ ! -f /root/.ollama/initialized ]; then ollama run llama3.1:8b-instruct-q8_0 && touch /root/.ollama/initialized; fi" ]
volumeClaimTemplates:
- metadata:
name: ollama-storage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: gp3
---
apiVersion: v1
kind: Service
metadata:
name: ollama
namespace: ai
spec:
ports:
- port: 11434
targetPort: 11434
selector:
app: ollama
Deploying Ollama on Kubernetes
Follow these steps to deploy Ollama:
-
Apply the Configuration: Run the following command to apply the YAML file.
shkubectl apply -f ollama.yaml
-
Verify the Pods: Ensure that the Ollama pods are running.
shkubectl get pods -n ai
Verifying the Deployment
To verify that Ollama is running correctly, you can use the following command to check the service status:
kubectl get svc -n ai
You should see the Ollama service listed with the appropriate ports.
Scaling Ollama
If you need to scale your deployment, you can adjust the replica count in the StatefulSet configuration:
replicas: 3
Apply the changes using kubectl apply -f ollama.yaml
.
Conclusion
Running Ollama on Kubernetes allows you to leverage the full power of Kubernetes for managing your AI workloads. With this guide, you should be able to deploy and scale Ollama efficiently.
For more detailed guides and tips on deploying AI applications on Kubernetes, visit our blog. Don't forget to subscribe to our newsletter for the latest updates and tutorials.
Deploy Ollama on Kubernetes today and take your AI capabilities to the next level! If you have any questions or need further assistance, feel free to contact us.
By following this guide, you're well on your way to running a successful Ollama deployment on Kubernetes. Enjoy the power and flexibility that Kubernetes offers for your AI applications!