Upskill/Reskill
Nov 28, 2024

How to Run Databases on Kubernetes: An 8-Step Guide

Adetokunbo Ige

Even though almost no one questions using Kubernetes (K8s) to manage container applications today, a lot of engineers (including me) remain very skeptical about running databases on Kubernetes. Because databases are typically stateful applications, they require persistent data storage and consistency, and Kubernetes built its reputation on stateless applications. Therefore, to run databases on Kubernetes, you need to ensure it can provide persistent storage, backup and restore, and high availability and failover.

In this tutorial, I’ll use the example of creating and running a MySQL database on Kubernetes to demonstrate how to manage stateful applications in Kubernetes. I will dive into key concepts such as StatefulSets, PersistentVolumes (PVs), PersistentVolumeClaims (PVCs) and StorageClasses. I’ll assume that you already have an understanding of both databases and Kubernetes.

Before I begin, it is vital to understand the difference between a stateless and a stateful application. Stateless applications do not keep data between requests; each request processes data individually with no concern about sharing the data. Stateful applications do keep data between requests and share it across sessions or pods. Workloads like databases need the data to be persistent.

Key Concepts for Running Databases on Kubernetes

Running databases such as MySQL, PostgreSQL and MongoDB on Kubernetes requires careful planning around persistent storage, stable network identities and scaling strategies. The following details need to be considered when running a database in Kubernetes.

Database Storage

Each database pod needs its own PV to ensure that the data is persistent. This means that even if the pod is deleted or restarted, the data still remains intact. Each database pod is assigned a dedicated PVC and PV.

Scaling Databases

When scaling databases, it is very important to ensure data consistency. StatefulSets support running a leader-follower database architecture (primary-secondary), or a primary, read-only replica database, like PostgreSQL or MySQL. The primary database handles updates or writes, while the secondary database replicates or synchronizes, ensuring both consistency and redundancy.

Data Consistency and Backups

It is crucial to have a strategy to ensure data consistency across all database replicas and validate the integrity of the data. Regular backups and disaster recovery plans should be incorporated into your Kubernetes workflows. This must include routine (weekly or monthly) disaster recovery tests to validate the integrity of the database backup.

StatefulSets

A StatefulSet is a Kubernetes resource designed for managing stateful applications such as databases. It ensures that pods possess persistent storage and that data remains intact even when the pods get restarted. Key features of StatefulSets include:

  • Persistent storage: StatefulSets utilize PVs, which ensure that each pod has dedicated, stable storage that remains intact even after a pod restarts.
  • Stable network identifiers: Every individual pod in a StatefulSet receives a unique and consistent name, which remains unchanged even after deployment; for example: mypod-0 , mypod-1 , mypod-2.

Tutorial: Create a Database on Kubernetes

To create a StatefulSet application (such as a database) on Kubernetes, follow this step-by-step guide.

Step 1: Create a StorageClass (if You Don’t Have One)

A StorageClass in Kubernetes is similar in concept to a profile, as it contains the details of an object. The storage class defines the storage type (either gp2 or gp3) and the parameter for your PV. You can specify a default storage class for dynamic volume provisioning and for any PVC that does not include a specific storage class.

Here is an example of a storage class created for Amazon EKS.

Create a new file called storage-class.yaml and copy this code into the file.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
provisioner: kubernetes.io/aws-ebs # Use the correct provisioner for your cloud provider (AWS, GCP, Azure, etc.)
parameters:
  type: gp3
reclaimPolicy: Retain

Create the storage class by running:

kubectl apply -f storage-class.yaml

Step 2: Create a PersistentVolume (PV)

apiVersion: v1
kind: PersistentVolume
metadata:
  name: mysql-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: standard
  hostPath:
    path: /mnt/data # Specify a path in the host for storage

A PV is storage allocated in your Kubernetes cluster. If dynamic provisioning is enabled, Kubernetes will create a PV automatically. Otherwise, you can create one manually.

Step 3: Create a Persistent Volume Claim (PVC)

A PVC serves as an interface between your application and requested storage. A PVC allows your application to request storage from the available PV.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: standard

Step 4: Deploy a MySQL StatefulSet

This code snippet creates a StatefulSet for MySQL This ensures each MySQL pod (instance) gets its own unique identifier, persistent storage and stable network identity.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: "mysql"
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:5.7
        ports:
        - containerPort: 3306
          name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: "your_password"
        volumeMounts:
        - name: mysql-storage
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: mysql-storage
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi
      storageClassName: standard

Step 5: Create a Headless Service for MySQL

Create a MySQL StatefulSets headless service to enable the pods to communicate between each other in the Kubernetes cluster. The headless service in the example below is named mysql. The MySQL pods will be accessible within the cluster by using the name <pod-name>.mysql from within any pod in the same Kubernetes namespace and cluster.

# Headless service
apiVersion: v1
kind: Service
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  ports:
  - name: mysql
    port: 3306
  selector:
    app: mysql

Step 6: Pipe MySQL Logs to Monitoring Tools

Monitoring MySQL is very important in identifying the database performance, bottlenecks, and errors and ensuring database health. The logs from the MySQL StatefulSets can be routed to monitoring tools such as Datadog, Grafana, Prometheus and ElasticSearch (the ELK Stack) to get full visibility into the performance and heath of the database.

You need to configure MySQL to pipe logs to your monitoring tools. Commonly monitored logs include:

  • Slow query logs identify slow running logs.
  • Error logs track errors and warnings.
  • General query logs track all MySQL queries.

Step 7: Perform Regular Backups and Routine Restore

It is very important to perform regular backups to ensure availability of your Kubernetes workloads and routine restore to validate the integrity of the database.

Velero is an open source tool designed to safely back up and restore resources on Kubernetes clusters and PVs. It is an excellent solution for ensuring that your applications or databases do not experience any data loss. Velero offers essential functionalities such as Kubernetes cluster backup, restore, disaster recovery and scheduled backups. For more information, check out Velero’s documentation.

Step 8: Configure Database Alerts

In a Kubernetes environment where databases and other StatefulSet applications run, it is crucial to set up alert notifications to continuously monitor and avoid performance degradation, service disruption, downtime or data corruption.

Monitoring tools such as Datadog, Nagios, Prometheus and Grafana can be used to monitor and check database health. They can be integrated with alert notification platforms such as Slack and PagerDuty, so an engineer will receive a notification (often a phone call) whenever there is a degradation in service or another issue with the database.

Conclusion

Running databases in Kubernetes creates unique challenges, including state management, persistent storage and network stability. Administrators can now comfortably manage database workloads in Kubernetes ensuring database integrity and availability by leveraging Kubernetes tools like PersistentVolumes, StorageClasses, StatefulSets and PersistentVolumeClaims.

As Kubernetes continues to evolve, its support for StatefulSets will increase, making running databases in Kubernetes a powerful solution for modern infrastructures.

About the author: Adetokunbo Ige

Adetokunbo Ige is a technologist for Andela, a private global talent marketplace. A seasoned platform engineer and a Certified ISO 22301 Lead Implementer in Business Continuity, he brings a wealth of experience in software engineering, enterprise application management, server infrastructure management, database management, incident management, and cloud engineering. He holds a B.Sc in computer science from Babcock University and an M.Sc in business information technology from Middlesex University, where he graduated with distinction. His technical proficiencies span various programming languages and tools including SQL Server, Oracle, MySQL, Docker, Kubernetes, and numerous scripting languages.

Interested in 
Learning More?

Subscribe today to stay informed and get regular updates from Andela.

You might also be interested in

Ready to get started?

Contact Us