Engineering Blog
ZurückKubernetes-Cluster auf cloudscale mit Cluster API provisionieren
Dieser Inhalt ist nur auf Englisch verfügbar:
With the new Cluster API infrastructure provider for cloudscale (CAPCS), Kubernetes clusters become declarative resources you can provision, scale, upgrade, and delete through reconciliation. We cover the architecture and a hands-on walkthrough, from setup to Day 2 operations.
If you are operating Kubernetes clusters, you probably know the burden of managing multiple tools and manual steps spread across the different lifecycle operations. Cluster API (CAPI) solves this issue by managing clusters in Kubernetes itself. Instead of managing clusters through ad-hoc scripts and infrastructure tooling, clusters themselves become Kubernetes resources that can be created, upgraded, scaled, and deleted declaratively. This is done using Kubernetes-style controllers which continuously reconcile the desired cluster state against the actual infrastructure state.
This post assumes working knowledge of Kubernetes — in particular controllers, custom resources, and the reconciliation model. You won't need prior Cluster API experience; we'll introduce its components as we go.
With the cloudscale Cluster API infrastructure provider (CAPCS), it is now possible to provision Cluster API clusters on cloudscale infrastructure. CAPCS complements the existing cloudscale CCM and CSI integrations by extending cloudscale support into cluster lifecycle management itself.
Cluster API provides the following:
- a unified declarative way to bootstrap and manage Kubernetes clusters
- an abstraction over infrastructure providers such as cloudscale, AWS, Google, Azure, OpenStack, etc.
In practice, Cluster API allows platform teams to manage Kubernetes clusters similarly to how Kubernetes manages applications.
Key Terms:
- a management cluster owns all cluster custom resources and it's where the CAPI controllers run
- one or more workload clusters are managed by the management cluster and are where the workload applications run
You may wonder why Kubernetes is required to create Kubernetes. Cluster API controllers run inside a Kubernetes cluster themselves, which is why a management cluster is required. However, management clusters are not special. They can be workload clusters first, and then promoted to be self-managed later.
Cluster API is intentionally modular. Different providers can implement infrastructure provisioning, node bootstrapping, and control plane management independently:
- CAPI core: Core lifecycle management controllers
- Infrastructure Providers: Responsible for creating and managing infrastructure resources such as virtual machines, networks, and load balancers. For example: cloudscale (CAPCS), Azure (CAPZ)
- Bootstrap Providers: Generate the configuration required for machines to join a Kubernetes cluster as nodes. For example: Kubeadm (CABPK), MicroK8s, Talos.
- Control Plane Providers: Manage the control plane component of Kubernetes clusters and make it possible to run lifecycle operations with ease. For example: Kubeadm (KCP), Talos.
So far, we stated that cloudscale is "Kubernetes ready" by providing CCM and CSI integrations, but customers had to setup the whole stack and lifecycle management themself. Now, Cluster API infrastructure provider cloudscale (CAPCS) enables you to seamlessly provision and manage clusters on our infrastructure. Follow the tutorial below to try it out!
Tutorial
Prerequisites
- An existing management cluster. We'll use
kind - kubectl
- clusterctl version >= 1.13.0.
- A read/write cloudscale API token in the project where clusters will live. You can create this in the Control Panel
We'll be setting up a kubeadm based cluster.
1. Build and import a custom OS image
The node image must already be kubeadm-ready (kubelet, kubeadm, containerd, ...). At the time of writing, CAPCS expects users to build a compatible image on their own.
You can build one with image-builder for OpenStack by following their Quickstart tutorial.
Once you've built an image, it needs to be imported into the cloudscale Control Panel "Custom images".
Select "QCOW2" as the source format and "Passthrough" for user data handling. "BIOS" can be used for the Firmware Type.
Test Images
If you'd rather skip the image build for a quick try, we host pre-built test images until 2026-07-31.
⚠️ Not suitable for production. These images are provided as-is, and will stop being served after the date above.
To use one, go to the Control Panel → Custom Images → Import a Custom Image, paste the URL for your desired Kubernetes version below, and apply the same settings as in the previous step (QCOW2 source format, Passthrough user data, BIOS firmware).
| Kubernetes version | Image URL (paste into Control Panel) |
|---|---|
v1.34.8 | https://capcs-test-images.objects.lpg.cloudscale.ch/ubuntu-2404-kube-v1.34.8 |
v1.35.5 | https://capcs-test-images.objects.lpg.cloudscale.ch/ubuntu-2404-kube-v1.35.5 |
v1.36.1 | https://capcs-test-images.objects.lpg.cloudscale.ch/ubuntu-2404-kube-v1.36.1 |
2. Initialize the management cluster
Create a local kind cluster, if you don't have an existing cluster yet, and wait until it's ready:
$ kind create cluster
# wait a couple of seconds/minutes until cluster-info returns a healthy endpoint
$ kubectl cluster-info
Make sure CLOUDSCALE_API_TOKEN is exported in your environment, or configure the variable in the clusterctl configuration file at $XDG_CONFIG_HOME/cluster-api/clusterctl.yaml.
Then initialize the management cluster:
# export CLOUDSCALE_API_TOKEN if necessary
export CLOUDSCALE_API_TOKEN="REDACTED"
# initialize cluster api components
clusterctl init --infrastructure cloudscale-ch-cloudscale
This will set up several components. Feel free to explore them! Interesting namespaces:
capi-system- CAPI corecapcs-system- CAPCScapi-kubeadm-bootstrap-system- Kubeadm bootstrap providercapi-kubeadm-control-plane-system- Kubeadm control plane provider
3. Create your first workload cluster
Setup required configuration:
# SSH public key added to nodes
export CLOUDSCALE_SSH_PUBLIC_KEY="ssh-ed25519 AAAA..."
# cloudscale.ch region
export CLOUDSCALE_REGION="lpg"
# Server image for nodes
export CLOUDSCALE_MACHINE_IMAGE="custom:ubuntu-2404-kube-v1.34.8"
# Flavor for control plane nodes
export CLOUDSCALE_CONTROL_PLANE_MACHINE_FLAVOR="flex-4-2"
# Flavor for worker nodes
export CLOUDSCALE_WORKER_MACHINE_FLAVOR="flex-4-2"
# Root volume size in GB
export CLOUDSCALE_ROOT_VOLUME_SIZE="50"
Ensure the CLOUDSCALE_MACHINE_IMAGE matches the name of your imported custom image (minus the custom: prefix).
Generate cluster YAML:
clusterctl generate cluster quickstart \
--kubernetes-version v1.34.8 \
--control-plane-machine-count=3 \
--worker-machine-count=3 \
> quickstart.yaml
We provision three control plane nodes to ensure a highly available setup. Because the API server requires a single, stable endpoint regardless of which physical machines are active, CAPCS automatically provisions a Load Balancer (see the CloudscaleCluster resource below) with a dedicated IP address. This IP serves as the primary API endpoint.
Inspect the generated YAML to learn more about how the custom resources define a cluster together. Then, run the following command to apply the manifest:
kubectl apply -f quickstart.yaml
What actually gets added?
The cluster consists of a number of subresources. You can show all of them and their status with the following command:
kubectl get cluster,cloudscalecluster,kubeadmcontrolplane,cloudscalemachinetemplate,machinedeployment,machineset,kubeadmconfigtemplate,machine,cloudscalemachine
Resource explanation
Cluster- Workload cluster definition
CloudscaleCluster- Cloudscale-specific setup (Networks, LoadBalancer, FloatingIP API, ...)
KubeadmControlPlane- Kubeadm configuration for control plane nodes
CloudscaleMachineTemplate- Cloudscale server specifics: Flavor, Image, ...
MachineDeployment- Manages a set of Machines (akin to Kubernetes Deployment)
MachineSet- Similar to a Kubernetes ReplicaSet for Machines
Machine- Represents the lifecycle of a cluster node and its backing infrastructure
CloudscaleMachine- Represents the actual cloudscale server resource backing the Machine
KubeadmConfigTemplate- Kubeadm configuration for worker nodes
The API server endpoint is managed entirely by the CloudscaleCluster. CAPCS allocates a Load Balancer and uses this service as the control plane endpoint, ensuring stable connectivity even when underlying control plane machines are replaced (for instance, during an upgrade).
4. Make the cluster ready
The cluster will now start provisioning. You can observe the cluster status as follows:
kubectl get cluster quickstart
clusterctl describe cluster quickstart
As soon as the PHASE column from kubectl get cluster shows Provisioned, you can fetch the kubeconfig for accessing the workload cluster:
clusterctl get kubeconfig quickstart > ~/.kube/quickstart.yaml
In another terminal, switch to the new kubeconfig:
# or use kubie/kubectx etc.
export KUBECONFIG=~/.kube/quickstart.yaml
If you inspect the workload cluster nodes, you'll realize the nodes are in status "NotReady":
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
quickstart-control-plane-jncqv NotReady control-plane 99m v1.34.8
quickstart-control-plane-7p4mx NotReady control-plane 99m v1.34.8
quickstart-control-plane-q2lnd NotReady control-plane 98m v1.34.8
quickstart-md-0-tzcm9-mlvkj NotReady <none> 98m v1.34.8
quickstart-md-0-tzcm9-8s4rk NotReady <none> 98m v1.34.8
quickstart-md-0-tzcm9-dwn2c NotReady <none> 97m v1.34.8
Cluster API provisions the cluster infrastructure and bootstrap process, but cluster add-ons such as CCM, CNI, and CSI are intentionally left configurable.
The Cluster add-ons
- CCM (cloud controller manager) connects Kubernetes to the cloud provider: it sets provider IDs and node addresses, and provisions load balancers for Services. cloudscale ships its own CCM.
- CNI (container network interface) implements pod-to-pod networking. Until a CNI is running, nodes stay
NotReady. Cloud-agnostic. We'll use Cilium. - CSI (container storage interface) dynamically provisions and attaches volumes for PersistentVolumeClaims. cloudscale ships its own CSI driver. Optional for this tutorial.
CAPI's job ends at provisioned, bootstrapped machines; making them Ready is the cluster's add-ons' job. We need the CCM and a CNI for that, so let's install them in order.
Important: Make sure to execute the following commands in your workload cluster!
Install CCM
# Make sure to have the right kubectl config context
export KUBECONFIG=~/.kube/quickstart.yaml
# setup secret, you may want to use a different api token to be able to distinguish the actors in the audit log
kubectl create secret generic cloudscale \
--from-literal=access-token='...' \
--namespace kube-system
# apply CCM
kubectl apply -f https://github.com/cloudscale-ch/cloudscale-cloud-controller-manager/releases/latest/download/config.yml
The CCM configures cloud-specific node information such as provider IDs:
$ kubectl get nodes -o custom-columns='NAME:.metadata.name,PROVIDER_ID:.spec.providerID,READY:.status.conditions[?(@.type=="Ready")].status'
NAME PROVIDER_ID READY
quickstart-control-plane-jncqv cloudscale://2d3f1a8c-9b4e-4c7a-8f12-6e0a9d3b5c41 False
quickstart-control-plane-7p4mx cloudscale://4e6b2c91-7d83-4a05-b2f1-8c3d0e5a7b62 False
quickstart-control-plane-q2lnd cloudscale://9f0a3d12-5c47-4e89-a1b6-2d8f4c70e391 False
quickstart-md-0-tzcm9-mlvkj cloudscale://7a1b9e0d-3c46-4f82-91ad-b5e2f4c80a96 False
quickstart-md-0-tzcm9-8s4rk cloudscale://1c5d7e23-8a94-4b60-9f3e-6b0a2d4c8f57 False
quickstart-md-0-tzcm9-dwn2c cloudscale://6b3f9a40-2e51-4d78-83c2-9a7e1f5b0d34 False
Install CNI
We'll use cilium in this example. Follow the official documentation for installing it. Then run:
# Make sure to have the right kubectl config context
export KUBECONFIG=~/.kube/quickstart.yaml
cilium install
This will take a few seconds (less than a minute) until:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
quickstart-control-plane-jncqv Ready control-plane 109m v1.34.8
quickstart-control-plane-7p4mx Ready control-plane 109m v1.34.8
quickstart-control-plane-q2lnd Ready control-plane 108m v1.34.8
quickstart-md-0-tzcm9-mlvkj Ready <none> 107m v1.34.8
quickstart-md-0-tzcm9-8s4rk Ready <none> 107m v1.34.8
quickstart-md-0-tzcm9-dwn2c Ready <none> 106m v1.34.8
Congratulations! You now have a fully working cluster provisioned using Cluster API.
Install CSI (optional)
To be able to dynamically provision volumes from cloudscale, follow the installation steps for csi-cloudscale.
Day 2 Operations
Unlike infrastructure provisioning tools which are typically executed externally, Cluster API continuously reconciles cluster state from inside Kubernetes itself.
Healthchecking
The provided templates do not configure Healthchecks for your Machines. Using MachineHealthChecks you can define under which conditions a Machine should be considered unhealthy and do automatic remediation on them.
Scaling Workers
Scaling workers is as simple as scaling the MachineDeployment:
kubectl scale machinedeployment quickstart-md-0 --replicas=5
Using the Cluster API provider for cluster-autoscaler it is also very simple to enable autoscaling for your workload. As a nice addition, CAPCS supports scale from zero natively. Follow the README to install it.
Upgrading Kubernetes Version
As mentioned in the introduction, Cluster API supports the full lifecycle of Kubernetes cluster management. This includes upgrading the Kubernetes version. In order to upgrade a Kubernetes cluster, you'll need to first build an OS image (see above and the image-builder repository for guidelines) with the target Kubernetes version, and import it to your custom images in the cloudscale Control Panel. Upgrading works by first upgrading the control plane machines, and then the worker machines.
1. Copy the CloudscaleMachineTemplate
Copy the existing CloudscaleMachineTemplate and adjust its name and set the updated image.
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: CloudscaleMachineTemplate
metadata:
name: "quickstart-control-plane-v1.35"
namespace: "${NAMESPACE}"
spec:
template:
spec:
image: "custom:ubuntu-2404-kube-v1.35.5"
[...]
Make sure to apply the copied template.
2. Modify the existing KubeadmControlPlane
apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: KubeadmControlPlane
metadata:
name: "quickstart-control-plane"
namespace: "${NAMESPACE}"
spec:
version: "v1.35.5" # << updated
machineTemplate:
spec:
infrastructureRef:
name: "quickstart-control-plane-v1.35" # << updated
[...]
Cluster API performs upgrades through rolling machine replacements: new machines are created with the updated configuration before old machines are removed.
As soon as the KubeadmControlPlane has been updated, you can see this in action:
$ kubectl get machine
NAME CLUSTER NODE NAME FAILURE DOMAIN READY AVAILABLE UP-TO-DATE PHASE AGE VERSION
quickstart-control-plane-69cnk quickstart quickstart-control-plane-69cnk False False True Running 104s v1.35.5
quickstart-control-plane-jncqv quickstart quickstart-control-plane-jncqv True True False Deleting 3h26m v1.34.8
quickstart-control-plane-7p4mx quickstart quickstart-control-plane-7p4mx True True False Running 3h26m v1.34.8
quickstart-control-plane-q2lnd quickstart quickstart-control-plane-q2lnd True True False Running 3h25m v1.34.8
Cluster API replaces control plane machines one at a time, keeping a quorum of healthy nodes throughout: here a new v1.35.5 machine is Running, the first old v1.34.8 machine is Deleting, and the remaining two are still Running until their turn.
Once the new control plane is ready, it will continue with replacing the worker nodes. The adjustments to the MachineDeployment md-0 are left as an exercise to the reader but it works almost the same way: copy CloudscaleMachineTemplate and adjust name and image, edit MachineDeployment and adjust version and referenced infrastructureRef.
Once the cluster is upgraded, the old CloudscaleMachineTemplates can safely be removed.
Promote a workload cluster to be the management cluster
A regular workload cluster can be promoted to become the management cluster. Often, the initial cluster of a Cluster API setup is referred to as a "seed" cluster and is just used for bootstrapping Cluster API. It is later abandoned and the first workload cluster becomes the management cluster.
For example, if you started with kind as your seed cluster, you may want to promote the first workload cluster to be your management cluster where it self-manages itself and potentially other workload clusters.
This complex-sounding operation is relatively simple and done in two steps:
- Initialize the target cluster using
clusterctl init --infrastructure cloudscale-ch-cloudscale. - Move the Cluster API objects defined in the current namespace to the target cluster:
clusterctl move --to-kubeconfig="path-to-target-kubeconfig.yaml"Once this is done, you should be able to run all operations from the new management cluster and you can tear down the initial cluster.
Deleting workload clusters
Warning! Destructive Actions ahead!
Deleting a workload cluster is straightforward:
kubectl delete cluster <name>
Important: Always delete the cluster object and not any downstream resources from it. Doing so might render the cluster not processable and might require manual cleanup (e.g. if you delete the api token secret before the cluster has been deleted). This is especially important to note when using GitOps tools for managing clusters.
Going further
As you can see, the process outlined in this tutorial is to make sure you understand the basic operations available to Cluster API clusters. For production setups you may want to automate most if not all steps. GitOps can help tremendously, but also some additional concepts of Cluster API are very much worth looking into:
- Healthchecking already mentioned above, is a crucial aspect of production clusters.
- ClusterResourceSet allows to label clusters and automatically apply a set of resources to workload clusters. CAPCS maintains a couple of examples in the addons templates.
- ClusterClass becomes especially useful once you're operating many clusters with largely identical configurations.
- Cluster API Operator is an operator to manage Cluster API components in a management cluster, allowing to use GitOps mechanisms for component upgrades and configuration.
- Cluster API Visualizer showcases a visual UI of workload clusters.
What did we learn
Cluster API turns Kubernetes cluster management itself into a declarative Kubernetes workflow. Provisioning, scaling, upgrades, and teardown are all handled through Kubernetes-style reconciliation instead of custom automation scripts.
With CAPCS, cloudscale infrastructure becomes part of the broader Cluster API ecosystem, complementing the existing CCM and CSI integrations. Together, these components cover infrastructure provisioning, Kubernetes cloud integration, and persistent storage management using Kubernetes-native APIs and tooling.
While a dedicated management cluster introduces additional operational overhead, Cluster API becomes especially powerful for ephemeral environments such as CI clusters, preview environments, or temporary testing infrastructure where entire Kubernetes clusters can be created and removed declaratively.
Wenn du uns Kommentare oder Korrekturen mitteilen möchtest, kannst du unsere Engineers unter engineering-blog@cloudscale.ch erreichen.