Cloudscale Engineering Blog RSS Feed

DIY AI Chatbot with Ollama, Open WebUI & DeepSeek-R1 on NVIDIA L40S

Mon, 14 Apr 2025 00:00:00 GMT

But why? There are already many free AI chatbots available…

With so many AI chatbots available for free, why go through the trouble of setting up a self-hosted one, beside technical curiosity? The answer lies in the often repeated and always neglected concerns about privacy, data protection, and trust:

Privacy & data protection: Your chat conversations are often logged, analyzed, and stored. Even if providers claim to anonymize data, you can never be certain how your information is being used.
Company secrets: Using an external AI chatbot risks exposing trade secrets, internal strategies, or confidential client information.
If a service is free, you are the product: The meanwhile worn-out phrase still holds true.

Evaluation

Disclaimer: I did not spend too much time for the evaluation of the stack and got a lot of suggestions of my teammate Alain, which already has some more experience with LLMs (large language model).

TLDR:

Web UI: Open WebUI
LLM Management: Ollama
LLM: DeepSeek-R1 70B

Web-Based Chat Interface

In my short research, I found two very promising Open-source tools as a AI chatbot Web UI:

Open WebUI: Simpler, 2 Docker services
LibreChat: More advanced, ~5 Docker services

I just went for to seemingly more light-weight solution, which is Open WebUI.

LLM Management Tool

Because Ollama is a really easy to use solution for managing LLMs and is natively supported by Open WebUI (no additional configuration needed) and also installs all necessary drivers and dependencies, I just went with this tool.

Large Language Model

Here comes the tricky part, selecting the right model. To ensure maximum performance, it needs to fit into the GPUs VRAM. The NVIDIA L40S GPUs, provided at cloudscale, has 48 GB of GDDR6 VRAM. Even though it is possible to use multiple L40S GPUs to run even larger models, I wanted to use a single GPU for this setup. The VRAM requirements can be found on the Ollama website or other model repositories. The DeepSeek-R1 70B model requires approximately 41 GB of VRAM, making it a great fit for this GPU.

Instructions

Setting up a GPU server on cloudscale

GPU servers are subject to Addendum for GPU servers / Vertragszusatz für GPU-Server. If you are interested or have any questions, please contact support.

I just created a GPU VM via the cloudscale control panel with the following specifications:

Flavor: GPU1-160-20-1-400 (could also be a GPU flavor with less RAM or fewer CPU cores, the GPU is what matters)
GPU Type: 1x NVIDIA L40S
Scratch Disk: 400 GB on RAID 1 (this will be handy for persisting the LLM)
Source Image: Debian 12 - Bookworm

If you want to follow this guide, I strongly recommend using Debian as the base image, as it ensures that all the steps will work as expected.

Mount the Scratch Disk

Even though we can load models up to 48 GB directly into the NVIDIA L40S's VRAM, we need to download and persist the models before we can run them. For this reason, at cloudscale, every GPU server comes equipped with an additional, local Scratch Disk. The Scratch Disk is tied directly to the server, but offers better performance than our usual volumes. Like other volumes, we have to mount it in our VM first:

# List all disks
lsblk -l

# Create a new folder for the mount
sudo mkdir -p /mnt/scratch

# Identify and mount Scratch Disk, e.g.: /dev/sdb
sudo mount /dev/sdb /mnt/scratch

# Get the device's UUID (note it somewhere or copy to clipboard)
sudo blkid /dev/sdb

# Persist the Scratch Disk mount
sudo vim /etc/fstab

# Add the following line
UUID= /mnt/scratch ext4 defaults 0 0

Later, we will configure Ollama to use the Scratch Disk to download and persist the models.

Manually install NVIDIA drivers (optional)

This step can be skipped entirely, because the Ollama install script will install all necessary dependencies automatically. Follow this guide, if you are interested in how to manually install the NVIDIA GPU drivers on a VM.

In the following section, I will give you a step-by-step guide for installing the necessary NVIDIA GPU driver on Debian 12 - Bookworm.

# SSH into the rebooted server
ssh debian@

# Upgrade Debian to the latest version
# In the presented dialog, you can just confirm the default setting
sudo apt update && sudo apt upgrade -y

# Edit the sources list to enable non-free software (e.g. NVIDIA Drivers)
sudo vim /etc/apt/sources.list

The file should be updated as follows:

# See /etc/apt/sources.list.d/debian.sources
deb http://deb.debian.org/debian bookworm main contrib non-free non-free-firmware
deb http://deb.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware
deb http://deb.debian.org/debian bookworm-updates main contrib non-free non-free-firmware

# Update the package list
sudo apt update

# Install the NVIDIA driver package
# Multiple dialogs will pop up, I just confirmed the default settings as they seemed reasonable enough
sudo apt install nvidia-driver

# Reboot VM as recommended
sudo reboot

# SSH into the rebooted server
ssh debian@

# Verify that the NVIDIA GPU drivers are installed and the GPU is correctly recognized by the VM
nvidia-smi

Install Ollama

# SSH into the server
ssh debian@

# Download and execute the Ollama install script
curl -fsSL https://ollama.com/install.sh | sh

# Reboot VM as recommended
sudo reboot

# SSH into the rebooted server
ssh debian@

# Verify that the NVIDIA GPU drivers are installed and the GPU is correctly recognized by the VM
nvidia-smi

# Verify that Ollama is installed successfully
ollama -v
sudo systemctl status ollama

# Configure Ollama to use the mounted Scratch Disk instead of the root volume
sudo mkdir -p /mnt/scratch/ollama_models
sudo chown ollama:ollama /mnt/scratch/ollama_models/

# Edit the ollama service configuration
sudo vim /etc/systemd/system/ollama.service

# Add the following line as the last one in the [Service] section
Environment="OLLAMA_MODELS=/mnt/scratch/ollama_models"

# Reload the ollama service
sudo systemctl daemon-reload
sudo systemctl restart ollama

# Download a small model to verify that it's stored on the Scratch Disk, the directory should not be empty any more
ollama pull smollm
ls /mnt/scratch/ollama_models/

# Cleanup the unused model
ollama rm smollm

# Install and run DeepSeek's R1
ollama run deepseek-r1:70b

# You can now test the LLM via CLI, exit with ctrl+d

Secure your service (optional)

If you plan to keep the service online, it's strongly advised to follow this section, or to implement alternative measures to protect it from unwanted access. This setup requires that you have a domain name.

Install nginx with certbot

In this guide, I will install and configure the web server nginx with certbot, to restrict the access and enable HTTPS for Open WebUI. Before your start, make sure that your domain's or subdomain's A and AAAA records are pointing to the GPU server's public IPv4 and IPv6 addresses respectively.

# Install nginx, certbot and certbot nginx plugin
sudo apt install nginx certbot python3-certbot-nginx

# Verify installation
sudo nginx -v

# Edit the nginx default configuration
sudo vim /etc/nginx/sites-available/default

# Replace  and  with the addresses you are using to access the internet.
# If you don't want to protect your service from unwanted access or you have other measures in place (e.g.: HTTP BasicAuth), the following three lines can be removed.
allow ;
allow ;
deny all;

upstream open_webui {
    # We will configure Open WebIU to listen on this port
    server 127.0.0.1:8080;
}

server {
    listen 80 default_server;
    listen [::]:80 default_server;

    root /var/www/html;

    index index.html;

    # Replace  with the domamin where you configured the A and AAAA records, the certbot plugin will also use this to automatically extend this file with the HTTPS configuration
    server_name ;

    # The proxy configuration is taken from: https://docs.openwebui.com/tutorials/https-nginx/#steps
    location / {
            proxy_pass http://open_webui;

            # Add WebSocket support (Necessary for version 0.5.0 and up)
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";

            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            # (Optional) Disable proxy buffering for better streaming response from models
            proxy_buffering off;

            # (Optional) Increase max request size for large attachments and long audio messages
            client_max_body_size 20M;
            proxy_read_timeout 10m;
        }
}

# Verify that the nginx config is valid
sudo nginx -t

# Reload configuration
sudo nginx -s reload

# Setup certbot
sudo certbot --nginx -d 

# Check the changes made by certbot:
# There should be new entries 'managed by certbot'
cat /etc/nginx/sites-available/default

Now you can open https:/// in a browser to verify if the TLS Certificate is correctly setup for your domain, and the page is served via HTTPS. The response will be "502 Bad Gateway", as the Open WebUI service is not reachable yet. If you receive a "403 Forbidden", this probably means you have to change the nginx configuration.

Install UFW (Uncomplicated Firewall)

# Install UFW
sudo apt install ufw

# Allow egress (we will monitor egress with OpenSnitch)
sudo ufw default allow outgoing

# Block ingress
sudo ufw default deny incoming

# Allow services, if you forget to allow SSH you will be blocked from accessing the VM!
sudo ufw allow ssh
sudo ufw allow vnc
sudo ufw allow http
sudo ufw allow https

# Enable UFW
sudo ufw enable

# Check status
sudo ufw status

# Verify that apt and certbot still can do their job
sudo apt update
sudo certbot renew --dry-run

Install Open WebUI

In this section, I will install Open WebUI into a Python Virtualenv, but there are other installation methods (e.g.: with Docker).

# Install dependencies
sudo apt install python3 python3-venv

# Verify python version is 3.11 to avoid compatibility issues
python3 --version

# Create a new user for running Open WebUI
sudo useradd -m open_webui

# Change directory and switch to user open_webui
cd /home/open_webui
sudo su open_webui

# Create a new directory and add a virtual environment
python3 -m venv venv

# Install Open WebUI in the virtual environment
venv/bin/pip install open-webui

# Start Open WebUI via the virtual environment, the default port is 8080
venv/bin/open-webui serve

You can now verify that your own Open WebUI is running, and create your admin account:

If you skipped the "Secure your service" section: http://:8080
Otherwise: https://

Now let's create a SystemD service, which runs Open WebUI in the background:

# Ctrl+c to stop Open WebUI and exit from the user open_webui
exit

# Create a new systemd service configuration
sudo vim /etc/systemd/system/open-webui.service

[Unit]
Description=Open WebUI Service
After=network.target

[Service]
Type=simple
WorkingDirectory=/home/open_webui
ExecStart=/home/open_webui/venv/bin/open-webui serve
KillSignal=SIGTERM
KillMode=mixed
Restart=always
RestartSec=3
StandardOutput=syslog
StandardError=syslog
User=open_webui
Group=open_webui

[Install]
WantedBy=multi-user.target

# Reload systemd
sudo systemctl daemon-reload

# Enable and start open-webui service
sudo systemctl enable open-webui.service
sudo systemctl start open-webui.service

# Check for any errors
sudo systemctl status open-webui.service

Today I Learned: GitLab Fleeting Edition

Thu, 27 Mar 2025 00:00:00 GMT

Today I Learned: GitLab Fleeting Edition

In collaboration with Puzzle ITC's Yannik Dällenbach, I have recently worked to publish fleeting-plugin-cloudscale. With this plugin it is possible to dynamically scale Gitlab runner instances on cloudscale.ch.

While the bulk of the work is all thanks to Yannik's efforts, I had the chance to integrate his work into our open source landscape. This is what I learned.

GitLab Fleeting Works Well

Originally, GitLab used the Docker machine driver, to offer autoscaled runner instances. This tool has been deprecated for years. The alternative takes the form of a Go plugin.

With only a few short functions, Yannik was able to implement the necessary logic to launch, list, and cleanup runner instances. It's easy to write such a framework in a way that makes things cumbersome. GitLab has avoided that pitfall.

The GitLab runner process uses the plugin to react to tasks popping up in the CI pipeline, ensuring there is always enough capacity when it is actually needed.

Fleeting Plugin Packaging Is Different

I figured that publishing this plugin would involve a container image. This turned out to be right, but how GitLab does it was a bit unexpected:

Using a bespoke fleeting-artifact command, we build a container image, given a set of architecture-specific binaries. Go's cross-compilation story is top-notch, so providing these binaries is easy, but it is unexpected to not use generic container image build tools.

I think the approach has its merits, but I might have saved some time, had I known about it sooner. I only realized my container images did not work, after I built them using ko.

Go 1.24 Tools

This was my first project where I got to try out Go 1.24's go tool command. It is used to integrate tools related to the development of the project into dependency tracking. In our case, this enabled me to integrate go-releaser and the mentioned fleeting-artifact.

To install these tools, I ran the following:

go get -tool -modfile tool.mod github.com/goreleaser/goreleaser/v2@latest
go get -tool -modfile tool.mod gitlab.com/gitlab-org/fleeting/fleeting-artifact/cmd/fleeting-artifact
go mod tidy -modfile tool.mod

This makes these tools available as follows:

go tool -modfile tool.mod fleeting-artifact
go tool -modfile tool.mod goreleaser

What I like about this approach, though it is a bit verbose, is the fact that I can run the exact same tool locally, and in the CI, and across all our workstations.

Dev-tooling should be versioned as well, to keep its results stable, and go tool accomplishes that.

Zizmor

I like linters, static analyzers, language-servers, and so on. Anything that supplements my knowledge with community-wisdom. I didn't know about Zizmor before, but I rarely write GitHub workflows, and I figured: Someone must have written a validator for these.

Indeed someone has: Zizmors goal is to protect the user from the following (and more):

Template injection vulnerabilities, leading to attacker-controlled code execution.
Accidental credential persistence and leakage.
Excessive permission scopes and credential grants to runners.
Impostor commits and confusable git references.

It's not precisely a validator, or at least I did not come to rely on it as such, but it pointed out some potential security issues with my GitHub workflows, and I learned some newer directives I missed before.

As a result: The GitHub workflows in-place for fleeting-cloudscale-plugin are now hardened and checked for issues on each commit.

Conclusion

Adopting and packaging someone else's hard work was a somewhat new experience for me. With most of the groundwork already laid out, I was able to spend more time on packaging and testing, which the final result reflects I think.

If you want to check out our plugin, see:

https://github.com/cloudscale-ch/fleeting-plugin-cloudscale

If you want to demo it, use our Ansible playbook that configures a GitLab instance, adds a Gitlab runner, the fleeting-plugin-cloudscale plugin, and optionally a distributed S3 cache, all in one go:

https://github.com/cloudscale-ch/gitlab-runner

You'll be presented with a GitLab instance where CI jobs automatically work, scale, and share their cache. It's a great demo, and a good starting point to introduce GitLab into your own organization. Everything inside our infrastructure, far away from Five Eyes.

Improving metrics collection for our Object Storage

Wed, 29 Jan 2025 00:00:00 GMT

Cloudscale offers S3-compatible Object Storage built on top of our Ceph storage cluster with three-fold replication. To provide the S3 service, we use Ceph's RADOS Gateway (radosgw). While radosgw includes built-in usage tracking, we found its metrics insufficient for the needs of a public cloud provider like us.

In the Objects tab in our Control Panel, customers can view their exact usage over time and as an example see the number of requests on a specific date. This detailed data is not just for customer insights, it is also critical for accurate billing. Beside the number of requests, the metrics include the number of objects, the used storage and the network traffic.

To bridge the gap in capabilities, we developed our own solution: rgw-metrics.

What is rgw-metrics?

rgw-metrics is a microservice that repeatedly collects the current usage data for every bucket from radosgw. This data is aggregated into the current hourly segments, which are persisted. This usage data is then queried by the Control Panel through an API provided by rgw-metrics. This API is quite narrow and was stable over the years. It only allows to fetch metrics for a single or for multiple object users.

┌────────────────┐       ┌───────────────┐       ┌─────────────────┐
│  Contol Panel  ├──────►│  rgw-metrics  ├──────►│  RADOS Gateway  │
└────────────────┘       └───────────────┘       └─────────────────┘

Designed as a standalone microservice, running on both of our sites, means it operates independently of the Control Panel. This independence ensures metrics are consistently collected, even during extended maintenance periods.

A journey of evolution

The first version of rgw-metrics was written in Flask back in 2017 when we first introduced our S3 storage. While functional, the application had received little maintenance since its launch. Over time, this led to challenges, the outdated dependencies, manual deployment steps and the fact that the Control Panel is build with a different framework, Django, made engineers cautious about touching the application.

To address these issues, we decided on a black-box rewrite of rgw-metrics, transitioning it from Flask to Django.

The black-box rewrite approach

To ensure a seamless transition, we prioritized maintaining the existing API's behavior. That way we were able to create a collection of tests to validate the new service against the existing one. For instance, we compared the historical usage data from our public acceptance tests over the past year. Together with countless other internal projects using the Object Storage. During the development, we ran a script to compare the output of the new Django-based service with the original Flask-based implementation. This ensured the output of the new service matched the old one under various scenarios.

# essentially, it was automating these steps:
curl -H "$AUTH_HEADER" "https://old-api.cloudscale.ch/v1/metrics/buckets?start=2023-12-31&end=2024-01-01" > "export_flask/metrics.json"
curl -H "$AUTH_HEADER" "https://api.cloudscale.ch/v1/metrics/buckets?start=2023-12-31&end=2024-01-01" > "metrics.json"
diff export_flask/metrics.json metrics.json

Thanks to this test-driven method we acutely found multiple bugs, including one in our data migration scripts. An existing column was copied to the wrong target column.

What is up next for rgw-metrics?

With the rewrite complete, rgw-metrics now benefits from up-to-date dependencies, a container based deployment, similar to our main application, and a similar structure, which will help us develop additional features.

With the foundation strengthened, we are ready to tackle upcoming improvements like the efficient detection of large buckets: Each bucket has a limit of 10 million objects. Beyond this threshold, performance may degrade. Currently, we proactively contact users approaching this limit. However, gathering the necessary data through the current endpoints is suboptimal, as it requires iterating over every bucket for each object user. The amount of object users is growing every day, this forces us to extend the API to allow for more efficient queries on large buckets, reducing overhead and improving responsiveness.

Stay tuned as we continue to enhance our metrics system and provide an even better Object Storage experience for our users.

Using Our Load Balancer to Set Up a Highly Available Kubernetes Control Plane

Tue, 07 Jan 2025 00:00:00 GMT

One of the great things about working as an engineer at cloudscale is that we get to work on many different products, technologies, and projects. When we began developing the load balancer as a service product, a crucial requirement was that it must be usable for creating highly available Kubernetes control planes. During the whole development process, I was looking forward to bootstrapping my first cluster using this new product. And now, that we have this blog, I want to share a few notes on how to create a highly available, stacked, Kubernetes control plane on three cloudscale Ubuntu VMs using containerd.

Provisioning the Cloud Infrastructure

The Kubernetes Documentation instructs us to set up the following:

In a cloud environment you should place your control plane nodes behind a TCP forwarding load balancer. This load balancer distributes traffic to all healthy control plane nodes in its target list. The health check for an apiserver is a TCP check on the port the kube-apiserver listens on (default value :6443).

This means that we'll need:

A Load Balancer
A Load Balancer Listener using port 6443
A Load Balancer Pool with round-robin algorithm
A Load Balancer Pool Member for each VM
A Load Balancer Health Monitor checking Port 6443 on the VMs

Since the easiest way to set all this up and get all the VMs running is with Terraform, I have provided a Terraform file (see appendix) if you want a quick start. The Terraform file setups of the following:

Ensure Terraform is installed on your machine. Then, navigate to the Terraform file’s directory and run it.

terraform init

Once initialized, export a read/write API token from a, preferably, empty project and create the infrastructure by running:

export CLOUDSCALE_API_TOKEN="..."
terraform apply

Terraform will display a preview of the resources it plans to create or update and prompt you for confirmation. Type yes to proceed.

Terraform will also output three variables at the end: kube_api_lb_ip, server_ips_private, and server_ips_public. We'll need this information later on. Ensure that you can SSH into all VMs using the ubuntu user using the public IP addresses.

Installing kubeadm and containerd

Now, fasten your seatbelt. Here is a condensed summary of the following articles from the Kubernetes Documentation: Installing kubeadm, Container Runtimes, Creating Highly Available Clusters with kubeadm. I take some shortcuts and have left out some things I deem not relevant for non-production setups.

All commands must be run on all nodes.

Configure Kubernetes’ apt repository. Replace 1.32 with the desired Kubernetes version.

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

Download the necessary packages.

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

Enable IP forwarding.

echo "net.ipv4.ip_forward = 1" | sudo tee /etc/sysctl.d/k8s.conf
sudo sysctl --system

Install containerd and configure the systemd cgroup driver for runc.

sudo apt install -y containerd
sudo mkdir /etc/containerd
containerd config default | sed 's/SystemdCgroup = false/SystemdCgroup = true/' | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd

Initializing the Cluster and Installing a CNI Plugin

Next, we'll initialize the cluster from control-node-1 and install Cilium as a CNI (Container Network Interface) plugin. In the kubeadm init command, pass the IPv4 IP address of the load balancer as --control-plane-endpoint (shown as kube_api_lb_ip in terraform show) and the private IP address of node 1 as --apiserver-advertise-address.

sudo kubeadm init --control-plane-endpoint "KUBE_API_LB_IP:6443" --apiserver-advertise-address="10.11.12.21" --upload-certs

Next, set up your $HOME/.kube/config file as shown in the output of kubeadm init and keep the join commands in a safe place.

At this point, I usually check the nodes and pods in my cluster to see if everything looks good. It's perfectly normal that the node is NotYetReady and that coredns pods are not yet running. But the other pods should be running.

kubectl get nodes -o wide
kubectl get pods -A -o wide

In our experience, Cilium is the most worry-free CNI plugin to install, so let's do that and admire some colorful ASCII art until it is ready.

wget https://github.com/cilium/cilium-cli/releases/download/v0.16.22/cilium-linux-amd64.tar.gz
tar xvf cilium-linux-amd64.tar.gz
./cilium install
./cilium status --wait

Now the node should be ready and coredns pods should also come up within a short amount of time.

Joining the Other Nodes

Now join the other two nodes into the cluster using kubeadm. The command was shown in the kubeadm join output. Be sure to add the --apiserver-advertise-address="10.11.12.2x" using the respective private IPs (.22 and .23, shown as server_ips_private in terraform show).

sudo kubeadm join :6443 --token  \
  --discovery-token-ca-cert-hash sha256: \
  --control-plane --certificate-key  \
  --apiserver-advertise-address="10.11.12."

After a certain period, all nodes listed in kubectl get nodes -o wide should be marked as Ready. And the coredns pods become running.

Seeing It in Action: Shutting Down a Node

Now, this blog post would, of course, not be complete without proofing that we can take down a control node. I suggest that you copy the $HOME/.kube/config to your local machine and use kubectl from there.

Let’s list all pods in the cluster. Pay attention to the coredns pods. They’re most likely scheduled on control-node-1.

kubectl get pods -A -o wide

Now drain control-node-1:

kubectl drain control-node-1 --ignore-daemonsets

After a few seconds, the coredns pods should have been successfully moved to the other control nodes.

kubectl get pods -A -o wide

It’s now safe to shut down the drained node.

sudo init 0

If everything worked, well, kubectl still works like a charm from your local machine and the other two control nodes. Another fun thing to do is to navigate to the private network named backend in the cloudscale Control Panel and look at the Ports tab. There, you'll see the network ports of the load balancer and the VMs. control-node-1 should be shown as down. The same applies to the "monitor_status" property if you query the pool members using our API.

After restarting the node, make it schedulable again.

kubectl uncordon control-node-1

And now you are ready to add worker nodes to the cluster. Or installing our Cloud Controller Manager (CCM) to configure a Load Balancer for managing external traffic, or setting up our Container Storage Interface (CSI) driver for persistent storage.

Lessons Learned and Final Words

In my initial test cluster, I made a mistake by not adding load balancer pool members for control-node-2 and control-node-3. As a result, when I shut down control-node-1, everything stopped working. So once, again I was reminded of: HA systems are worthless if failover testing is not done.

I hope this guide was interesting to you. If you find this type of content valuable, please send us an email because this could be the beginning of a small miniseries on running Kubernetes on cloudscale Infrastructure. As mentioned earlier, there’s much more to cover.

Have fun experimenting with Kubernetes on our infrastructure, but please read the complete documentation linked above before deploying production workloads!

Appendix: Terraform File

terraform {
  required_providers {
    cloudscale = {
      source  = "cloudscale-ch/cloudscale"
      version = "4.4.0"
    }
  }
}

provider "cloudscale" {
  # Add your provider configuration here if necessary
}

variable "control_node_count" {
  description = "Number of control nodes"
  type        = number
  default     = 3
}

variable "network_cidr" {
  description = "CIDR block for the backend network"
  type        = string
  default     = "10.11.12.0/24"
}

variable "zone_slug" {
  description = "Zone slug for the resources"
  type        = string
  default     = "lpg1"
}

variable "ssh_key_path" {
  description = "Path to the SSH public key file"
  type        = string
  default     = "~/.ssh/id_ed25519.pub" # Replace with your SSH key file path
}

# Create a network
resource "cloudscale_network" "backend" {
  name                    = "backend"
  zone_slug               = var.zone_slug
  auto_create_ipv4_subnet = "false"
}

# Create a subnet
resource "cloudscale_subnet" "backend-subnet" {
  cidr         = var.network_cidr
  network_uuid = cloudscale_network.backend.id
}

# Server Group for Control Nodes
resource "cloudscale_server_group" "control-plane-group" {
  name      = "control-plane-group"
  type      = "anti-affinity"
  zone_slug = var.zone_slug
}

# Control Nodes
resource "cloudscale_server" "control-nodes" {
  count            = var.control_node_count
  name             = "control-node-${count.index + 1}"
  flavor_slug      = "flex-8-4"
  image_slug       = "ubuntu-24.04"
  volume_size_gb   = 50
  ssh_keys         = [file(var.ssh_key_path)]
  server_group_ids = [cloudscale_server_group.control-plane-group.id]
  zone_slug        = var.zone_slug

  interfaces {
    type = "public"
  }

  interfaces {
    type = "private"
    addresses {
      subnet_uuid = cloudscale_subnet.backend-subnet.id
      address     = "10.11.12.${count.index + 21}"
    }
  }
}

# Kube-API Load Balancer
resource "cloudscale_load_balancer" "kube-api-lb" {
  name        = "kube-api-lb"
  flavor_slug = "lb-standard"
  zone_slug   = var.zone_slug
}


# Create a load balancer pool
resource "cloudscale_load_balancer_pool" "kube-api-pool" {
  name               = "kube-api-pool"
  algorithm          = "round_robin"
  protocol           = "tcp"
  load_balancer_uuid = cloudscale_load_balancer.kube-api-lb.id
}

# Create a load balancer listener
resource "cloudscale_load_balancer_listener" "kube-api-listener" {
  name          = "kube-api-listener"
  pool_uuid     = cloudscale_load_balancer_pool.kube-api-pool.id
  protocol      = "tcp"
  protocol_port = 6443
}


# Create a load balancer pool member
resource "cloudscale_load_balancer_pool_member" "kube-api-pool-member" {
  count         = var.control_node_count
  name          = "kube-api-${count.index}"
  pool_uuid     = cloudscale_load_balancer_pool.kube-api-pool.id
  protocol_port = 6443

  # Get the private IP address of the control node
  address = flatten([
    for iface in cloudscale_server.control-nodes[count.index].interfaces : [
      for addr in iface.addresses : addr.address
      if iface.type == "private"
    ]
  ])[0]
  subnet_uuid = cloudscale_subnet.backend-subnet.id
}

# Create a load balancer pool member
resource "cloudscale_load_balancer_health_monitor" "lb1-health-monitor" {
  pool_uuid = cloudscale_load_balancer_pool.kube-api-pool.id
  type      = "tcp"
}


output "kube_api_lb_ip" {
  value       = cloudscale_load_balancer.kube-api-lb.vip_addresses[0].address
  description = "IPv4 Address of the Load Balancer"
}

output "server_ips_public" {
  value = [
    for node in cloudscale_server.control-nodes :
    flatten([
      for iface in node.interfaces : [
        for addr in iface.addresses : addr.address
        if iface.type == "public"
      ]
    ])[0]
  ]
  description = "The public IP addresses of the control nodes."
}

output "server_ips_private" {
  value = [
    for node in cloudscale_server.control-nodes :
    flatten([
      for iface in node.interfaces : [
        for addr in iface.addresses : addr.address
        if iface.type == "private"
      ]
    ])[0]
  ]
  description = "The private IP addresses of the control nodes."
}

Executable Ansible Playbooks

Thu, 12 Dec 2024 00:00:00 GMT

We use Ansible to provision, configure, and orchestrate our infrastructure. The code has grown over the years, and systems have sprung up that we need to interact with when a playbook is running. We manage our inventory in Netbox, we record playbook runs using ARA, and store secrets in Vault/OpenBao.

As a result, our playbooks require more and more knowledge about our environment, and the chance of "holding them wrong" increases.

How it Started

Like most Ansible users, we initially called all playbooks like this:

$ ansible-playbook playbooks/state/all.yml -i inventory/prod-rma1 -l www --diff

This works great, but it is more verbose than necessary. Some in our team, including - if not limited to - me, prefer short commands over long ones. Especially if they need to be used frequently.

So we took our existing playbooks like this one:

- name: Base network and boot setup
  hosts: '{{ playbook_hosts | default("ansible-managed") }}'

And added a shebang:

#!/usr/bin/env ansible-playbook
- name: Base network and boot setup
  hosts: '{{ playbook_hosts | default("ansible-managed") }}'

Then we made the file executable:

$ chmod +x playbooks/state/all.yml

And presto, we could drop the command name from our CLI, calling the same playbook as follows:

$ playbooks/state/all.yml -i inventory/prod-rma1 -l www

How it is Going

When integrating ARA logging, we discovered that operators would always have to use --diff / -D, for differences to actually be logged.

We figured that typing --diff is not something we wanted to add to our documentation. We wanted to enforce it. We also had some configuration we needed to apply, depending on the inventory selection.

Maybe there is another way, but we figured: Since we already use our playbooks somewhat like CLIs, why not wrap ansible-playbook and go all-in?

So that's what we did. We wrote our own Python CLI called ap, that would inspect the arguments destined for ansible-playbook, and introduce some of its own arguments for a better user experience.

The script uses Typer and looks roughly as follows (this is a minimized version to illustrate the concept):

import os
import subprocess
import sys

from typer import Context
from typer import Option
from typer import Typer
from typing import Annotated


cli = Typer(add_completion=False, add_help_option=False, no_args_is_help=True)


@cli.command(name='ap', context_settings={
    "allow_extra_args": True,
    "ignore_unknown_options": True
}, add_help_option=False, no_args_is_help=True)
def main(
    ctx: Context,
    inventory: Annotated[list[str], Option("--inventory", "-i", help=(
        "Inventories to use, each either a site or a full inventory path"
    ))] = [],
) -> None:
    args = ['ansible-playbook', *(a for a in ctx.args)]

    # Configure inventories
    for i in inventory:
        i = i.removeprefix('inventory/')

        args.append('-i')
        args.append(f'inventory/{i}')

    # Always use diff
    if '-D' not in args and '--diff' not in args:
        args.append('--diff')

    # Configure systems
    os.environ.update(ara_env(args))
    os.environ.update(netbox_env(args))
    os.environ.update(secrets_env(args))

    # Execute
    result = subprocess.run(args, env=os.environ)
    sys.exit(result.returncode)


if __name__ == '__main__':
    cli()

Aside from enforcing arguments, this also gave us the flexibility to shorten our inventory calls a little, as the inventory prefix is really always the same. So while we started with this:

$ ansible-playbook playbooks/state/all.yml -i inventory/prod-rma1 -l www --diff

We can now call this, which is equivalent:

$ playbooks/state/all.yml -i prod-rma1 -l www

And now we can use all these saved keystrokes on useful things, like our engineering blog!

pyastgrep: Python-Code anhand des ASTs und XPath durchsuchen

Fri, 29 Nov 2024 00:00:00 GMT

Ein mittelgrosses Software-Projekt

Hier bei cloudscale arbeite ich hauptsächlich am Control-Panel, einer mittelgrossen Python/TypeScript-Applikation. Das Control-Panel ist die Applikation, welche unter control.cloudscale.ch erreichbar ist. Die Entwicklung begann vor 8 Jahren als Neu-Anfang und die Applikation wurde seither stetig weiterentwickelt und gepflegt.

In diesem Artikel konzentriere ich mich auf dem Python-Teil, welcher heute ~120 k Zeilen Code umfasst:

$ git ls-files '*.py' | wc -l
     674
$ git ls-files '*.py' | parallel -Xj1 cat | wc -l
  119248

Suchen von Code-Stellen

Wie bei jedem Software-Projekt, ist es bei der Arbeit am Control-Panel regelmässig notwendig, Code-Stellen, welche weit über die Applikation verteilt sind, zu finden oder anzupassen. Z.B. möchte ich herausfinden, ob eine interne API auf eine bestimmte Art verwendet wird oder ob diese überhaupt noch verwendet wird. Oder ich möchte prüfen, ob ein veraltetes oder problematisches Code-Muster, das ich entdeckt habe, noch an weiteren Stellen in der Applikation vorkommt. Das Ziel ist immer, den Code verständlicher und einfacher erweiterbar/wartbar zu machen.

Das erste Werkzeug, zu dem ich greife, ist "Suchen & Ersetzen" meiner IDE oder ein anderes, ähnlichen Werkzeug. Hier kann ich mit Regular Expressions mit wenig Aufwand nach Stellen im Code suchen. Dieser Ansatz ist sehr schnell und daher interaktiv. Die Geschwindigkeit von git grep wird wohl durch kein Werkzeug erreich, welches zuerst den Python-Source parsen müsste:

$ time git grep -P '\bsend_email\(' '*.py' > /dev/null

real	0m0.030s
user    0m0.014s
sys     0m0.098s

Aber Regular Expressions sind nicht geeignet, wenn komplexere zusammenhänge im Code erkennt werden müssen. Ein Beispiel wäre, alle Verwendungen einer Funktion zu finden, bei denen ein optionales Argument angegeben wird. Dafür gibt es besser geeignete Werkzeuge, welche allerdings auch mehr Vorbereitung bei der Anwendung benötigen. Im Folgenden Zeige ich eines der Werkzeuge, mit denen ich in den letzten Monaten viel gearbeitet habe.

pyastgrep

pyastgrep ist eine Library und eine CLI-Applikation, welche zum Durchsuchen von Python-Code anhand dessen AST (Abstract syntax tree) verwendet werden kann. pyastgrep stellt intern den Python-AST einer einzelnen Source-Datei oder eines ganzen Ordners als XML-Baum zur Verfügung. Diesen kann dann mit XPath-Ausdrücken durchsucht werden. Es ist dabei in jedem Fall sehr hilfreich, die Dokumentation des Python-Moduls ast und ein XPath Cheatsheet bereitzuhalten.

Als Beispiel zeige ich, wie ich alle Code-Stellen finden kann, bei denen die Funktion send_email() aufgerufen und ein Wert für das optionale Argument reply_to_address angegeben wird. Dazu baue ich schrittweise einen XPath-Ausdruck auf.

Im ersten Schritt selektiere ich alle Funktions-Aufrufe von Funktionen mit dem Namen send_email().

$ pyastgrep './/Call[func/Name[@id="send_email"]]' src
src/db/access/member_helper.py:57:9:        send_email(
src/services/openstack/functions.py:103:5:    send_email(

Call und Name sind die Knoten im Syntaxbaum, welche einen Funktionsaufruf respektive eine die Verwendung einer globalen oder lokalen Variable darstellen. Name[@id="send_email"] selektiert alle Variablen-Verwendungen von Variablen mit dem name send_email.

Soweit so gut! Ich weiss aber, dass weitaus mehr Code-Stellen diese Funktion aufrufen. Das Problem ist, dass die Funktion entweder als send_email() oder aber auch als email.send_email() aufgerufen werden kann. Da dies die einzige Funktion mit diesem Namen ist, kann ich etwas ungenau arbeiten, und alle Stellen selektieren, in denen auf einem beliebigen Objekt oder Modul eine Funktion mit diesem Namen aufgerufen wird:

$ pyastgrep './/Call[func[Name[@id="send_email"] or Attribute[@attr="send_email"]]]' src
src/panel/signals.py:18:13:            email.send_email(
src/panel/invoices/__init__.py:169:30:    to_address, cc_address = email.send_email(
src/panel/payment/__init__.py:46:9:        email.send_email(
src/panel/email/tests/test_template_rendering.py:15:5:    email.send_email(
src/panel/billing/notifications.py:28:30:    to_address, cc_address = email.send_email(
[... 10 weitere Resultate]

Attribute sind die Knoten, bei denen mit dem .-Operator auf ein Attribut eines anderen Objektes zugegriffen wird, wie z.B. in email.send_email. func[... or ...] selektiert die Funktionsaufrufe beider Varianten (lokale/globale Variable und Attribut).

Als Letztes schränke ich die Suche auf alle Stellen ein, an denen das Keyword-Only-Argument reply_to_address übergeben wird:

$ pyastgrep './/Call[func[Name[@id="send_email"] or Attribute[@attr="send_email"]] and keywords/keyword[@arg="reply_to_address"]]' src
src/panel/signals.py:18:13:            email.send_email(
src/panel/billing/notifications.py:51:5:    email.send_email(
src/project/tests/test_email_backend.py:30:5:    email.send_email(
src/db/access/member_helper.py:57:9:        send_email(
src/db/access/user/tickets.py:51:9:        email.send_email(
[... 5 weitere Resultate]

Call[... and ...] selektiert alle Funktionsaufrufe die beiden Bedingungen entsprechen (Funktionsname und Vorhandensein des Keyword-Arguments). keywords/keyword[...] iteriert über alle Keyword-Argumente des Funktionsaufrufs. @arg="reply_to_address" selektiert die Keyword-Argumente, die das Keyword reply_to_address verwenden (send_email(..., reply_to_address=...)).

Die Bedingung für das Keyword-Argument kann auch gut umgedreht werden. Als letztes Beispiel selektiere ich hier alle Aufrufe von send_email() bei denen das Argument reply_to_address nicht übergeben wird:

$ pyastgrep './/Call[func[Name[@id="send_email"] or Attribute[@attr="send_email"]] and not(keywords/keyword[@arg="reply_to_address"])]' src
src/panel/invoices/__init__.py:169:30:    to_address, cc_address = email.send_email(
src/panel/payment/__init__.py:46:9:        email.send_email(
src/panel/email/tests/test_template_rendering.py:15:5:    email.send_email(
src/panel/billing/notifications.py:28:30:    to_address, cc_address = email.send_email(
src/services/openstack/functions.py:103:5:    send_email(

Genauigkeit ist immer eine Abwägung

Als Abschluss möchte ich anmerken, dass das oben gezeigte Beispiel aus verschiedenen Gründen falsche Resultate liefern kann, also zu viele oder zu wenige. Jede dieser Abweichungen kann begegnet werden, mit jeweils unterschiedlichem Aufwand und nicht in jedem Fall perfekt. Dies sind ein paar Beispiele für Ungenauigkeiten, die ich im Beispiel oben zugelassen habe:

Es könnte neben der gesuchten Funktion weitere Funktionen mit dem Namen send_mail geben. Um dem zu entgegnen, müssten die import-Anweisungen in jeder Source-Datei sowie das Vorhandensein von lokalen Variablen analysiert werden.
Die Funktion send_mail könnte in einer Datei unter einem anderen Namen importiert worden sein, z.B. mit from panel.email import send_email as send_email_, vielleicht um einem Namenskonflikt aus dem Weg zu gehen. Auch hierfür müssten die import-Anweisungen analysiert werden.
Das Argument reply_to_address könnte als Positional-Argument übergeben werden. In dem Fall müsste das Argument anhand der Position in der Argumentliste statt des Keywords reply_to_address selektiert werden.
Das Argument reply_to_address könnte dynamisch via **kwargs übergeben werden. Dieser Fall ist sehr schwierig automatisiert vollständig zu erkennen. In unserem Fall wäre es am effektivsten gewesen, die Stellen, an denen **kwargs verwendet wird, automatisiert zu finden und diese dann manuell zu prüfen.

In den meisten Fällen gibt es keine perfekte Lösung, oder der Aufwand dafür ist grösser als der Nutzen. In diesen Fällen ist man gezwungen, eine Abwägung zwischen Genauigkeit, Flexibilität und Aufwand zu machen. Je grösser eine Applikation wird, je mehr gewinnen in meiner Erfahrung Genauigkeit und Flexibilität an Gewicht.

Filling the Fridge - My onboarding @ cloudscale

Thu, 28 Nov 2024 00:00:00 GMT

Place and Position the Fridge

My onboarding started with a one on one session with the Team Lead and included a mix of setup activities:

Unboxing my personal hardware, basic MacBook and Backup setup
Create accounts for internal systems like LDAP, VPN, SSH, etc., according to the password policy
Reading and signing papers and more

While commuting from Biel to Zürich and in the home office I could individualize my setup and familiarize myself with company tools and workflows. I also got time to start working through the cloud exercises available on GitHub, which gave me a better understanding how the API is working and how the customers are interacting with our system.

Let the Fridge Settle

Once I had everything set up, I was introduced to the software projects I would be working on at cloudscale. In one of the next daily standups I was also assigned my first small task: Include Server Name in Extra Traffic Transaction Description. Soon after, I was introduced into the code review / QA process, which enabled me to review and test the work of my teammates.

One by one I attended the companies different meeting formats:

One-on-One: Weekly retrospective with my team lead
Sprint planing: Every two weeks the team discusses and decides the scope of the next sprint
Dev Team Insights: Each month our team presents an interesting insight, the last one was about how we use end-to-end testing in current projects
All-Hands Retrospective: A workshop where the whole company sits together to identify road blocks and proposes solutions for resolving them
Brownbag: A voluntary format in which an employee presents a topic (e.g.: Cloud Native Days), which usually takes place over lunchtime, hence the name

Set the Temperature

A key experience for understanding the company was the introduction day with Mänu, the CEO of cloudscale. We took a dive into the company’s history, structure, and where we are located in the market and the cloud pyramid, covering everything from our server centers to the networks and topologies that support our infrastructure. Alongside historical and technical insights, I learned about cloudscale's core values:

Quality - go the extra mile
360° Transparency - Internally and externally
Privacy / Security - From software to communication channels
Swissness - Location, reliability and Secrecy
Simplicity / Approachability - Being on eye level with customers, business partners and coworkers

Load Beverages Gradually

To not overload a fridge, the beverages should not be filled all at once. Analogous, I was gradually introduced in further important topics and given more responsibility/autonomy:

Further sessions about implementation details in our projects
Working on bigger tasks
Writing this blog post
Information Security Management System (ISMS) and ISO/IEC 27002
Introduction to the system engineering team and which technologies they are using

Once the above topics are cooled down, the fridge can be loaded further:

Visiting the server centers
Introduction to Support
Holding the next Dev Team Insights Presentation
Organizing All-Hands Retrospective Meeting

Enjoy a cold one

For me, switching Jobs triggered some insecurities, but thanks to a well-structured onboarding process and a very supportive team, the transitioning to cloudscale has been a smooth experience. Cheers!

Disclaimer: We just got a shiny new fridge with tasty beverages. I was not involved in the actual fridge project, and it was filled by the CEO himself.

Staggering Restarts in Ceph

Wed, 27 Nov 2024 00:00:00 GMT

When customers use a disk in our cloud, they talk to one of our Ceph storage clusters. Every byte written is sent, every byte read received from one. The underlying physical hardware is abstracted away, shielding the VMs from unexpected disk failures and planned maintenance procedures.

To manage our clusters, we sometimes have to restart their services. Here's how we do that while minimizing customer impact.

Impact of Restarts

When a single disk dies, it typically takes one or two OSDs with it. This can rarely be detected by our customers - after all, this is what Ceph is all about.

However, restarting a lot of OSDs concurrently causes throughput drops and latency spikes. A drop in throughput can be noticed because a PostgreSQL server might suddenly be much slower in scanning its tables, a drop in latency is especially noticeable in clusters like Etcd, where high latency might cause leader elections.

💡 Tip

If you use Etcd in the cloud, tuning its time parameters may help avoid unnecessary leader elections:

https://etcd.io/docs/v3.4/tuning/

https://www.redhat.com/en/blog/introducing-selectable-profiles-for-etcd

If we restart as many OSDs as possible simultaneously (a third of a cluster), the impact on customer VMs is quite visible in benchmarks:

If we manually stagger the OSD restarts five seconds apart, we get more spread out numbers, especially when it comes to latency:

The effect on throughput is not too dramatic, but that's a function of the cluster size. On our smaller, internal clusters, the effect is more pronounced:

By staggering starts, we spread out the negative effects:

Note that staggering stops has no positive impact. Ceph uses kill -9 on its OSDs when stopping them and it is designed to deal well with suddenly disappearing OSDs.

Automating Staggered Restarts

While we are able to manually start OSDs in a staggered fashion, we have situations where we cannot do that. We want staggered OSD starts when a host unexpectedly reboots, when it gets reinstalled, when we do a package upgrade and so on. Ideally, we don't want to have to think about it.

To achieve this, we wrote a Python script that uses cluster-wide locking to ensure that OSDs are started slightly apart:

stagger-osds.py