How to Run Ollama on Kubernetes: A Comprehensive Guide

Tasrie IT Services

August 01, 2024

·3 min read

ollama

Introduction to Ollama

Ollama is a cutting-edge AI application designed for high-performance computing tasks. Running Ollama on Kubernetes ensures that you can take advantage of Kubernetes' scalability and reliability features.

Introduction to Ollama
Prerequisites
Understanding the Ollama YAML Configuration
Deploying Ollama on Kubernetes
Verifying the Deployment
Scaling Ollama
Conclusion
Call to Action

Prerequisites

Before you begin, make sure you have the following:

A running Kubernetes cluster
kubectl command-line tool configured to interact with your cluster
Docker registry access to pull the Ollama image

Understanding the Ollama YAML Configuration

The core of your deployment is defined in the ollama.yaml file. Here’s a breakdown of the key sections:

StatefulSet Configuration: This ensures that Ollama pods have persistent storage and maintain their state across restarts.
Service Configuration: Exposes Ollama to the network, allowing you to access it.

Sample Configuration

yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: ollama
  namespace: ai
spec:
  serviceName: "ollama"
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
        - name: ollama
          image: ollama/ollama
          ports:
            - containerPort: 11434
          volumeMounts:
            - name: ollama-storage
              mountPath: /root/.ollama
          lifecycle:
            postStart:
              exec:
                command: [ "/bin/sh", "-c", "if [ ! -f /root/.ollama/initialized ]; then ollama run llama3.1:8b-instruct-q8_0 && touch /root/.ollama/initialized; fi" ]
  volumeClaimTemplates:
    - metadata:
        name: ollama-storage
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi
        storageClassName: gp3
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: ai
spec:
  ports:
    - port: 11434
      targetPort: 11434
  selector:
    app: ollama

Deploying Ollama on Kubernetes

Follow these steps to deploy Ollama:

Apply the Configuration: Run the following command to apply the YAML file.
sh
```
kubectl apply -f ollama.yaml
```
Verify the Pods: Ensure that the Ollama pods are running.
sh
```
kubectl get pods -n ai
```

Verifying the Deployment

To verify that Ollama is running correctly, you can use the following command to check the service status:

kubectl get svc -n ai

You should see the Ollama service listed with the appropriate ports.

Scaling Ollama

If you need to scale your deployment, you can adjust the replica count in the StatefulSet configuration:

yaml

replicas: 3

Apply the changes using kubectl apply -f ollama.yaml.

Conclusion

Running Ollama on Kubernetes allows you to leverage the full power of Kubernetes for managing your AI workloads. With this guide, you should be able to deploy and scale Ollama efficiently.

For more detailed guides and tips on deploying AI applications on Kubernetes, visit our blog. Don't forget to subscribe to our newsletter for the latest updates and tutorials.

Deploy Ollama on Kubernetes today and take your AI capabilities to the next level! If you have any questions or need further assistance, feel free to contact us.

By following this guide, you're well on your way to running a successful Ollama deployment on Kubernetes. Enjoy the power and flexibility that Kubernetes offers for your AI applications!