Introduction

I had turned inter-webs upside down back in late 2019 - early 2020 (pre-COVID) to find a tutorial that I could follow on multi-master Kubernetes clusters on bare-metal to no avail. Well, I have revisited that exercise, flock around! This tutorial will use VMs (on PVE as usual) for ease of use though, especially for backups.

vm-backups.png

  • CloudInit Debian 13 AMD64 machines are used (remember to increase disk size first :D). Checkout Techno Tim’s video which was linked in proxmox-cloudinit-arm64-vm for details or just use NoCloud/manual install Debian.
  • Bridge network is used on the host to better simulate bare-metal experience. However, for those who would try this on BM, adjust interface names, which are conveniently all eth0 in this post.
  • I would checkout Talos for bare-metal, Kairos (on Yocto) for embedded scenarios. If you work for government, nobody ever got fired for choosing IBM [or SUSE].
  • I am yet to test Cilium out for CNI. It is praised for its observability. But one step at a time, Calico is used instead.
  • Unfortunately, redundant storage (e.g. Rook, Longhorn) is not within scope of this exercise either. Hopefully, there will be follow up posts.

Aim

  1. A self-hosted 3-node kubernetes cluster where each node is part of control plane.
  2. Ability to take down any one of the nodes and keep having operational API server and Services.
  3. Use of simple network equipment, as those are not cheap in where I currently am.

For the bullet points above, this should be threated as just an educational resource.

Summary

Creating a highly available cluster with kubeadm is already explained in the official documentation. However, that assumes a load-balancer, setup of which is the main focus in here.

Kube-VIP will be deployed on the control plane nodes as Static Pods to achieve load balancing of control plane only. This allows for bootstrapping the cluster.

Thereafter, a CNI network operator needs to be deployed. Calico seems traditional and functional enough (Network Policies).

Lastly, Kube-VIP is deployed a second time, but as a DaemonSet to load balance the services. Even with use of Ingress/Gateway API, a Virtual IP is desired for the Ingress. This part could be replaced with another load balancer (e.g. MetalLB), however, sticking to a single project has less moving parts.

Steps

Disclaimer: AI chat bots (i.e. ChatGPT and Gemini) has been consulted not in making of this blog post, but in debugging errors while distilling the following steps. Otherwise, sources are added where applicable.

Cluster is deployed initially on node cp1, if a set of commands needs to be run on certain nodes, it will be denoted by appending to the title, e.g. @all.

Nodes:

  1. cp1: 192.168.60.241/24
  2. cp2: 192.168.60.242/24
  3. cp3: 192.168.60.243/24

Virtual IPs:

  1. control plane: 192.168.60.240 (A record k8s-cp.bug.tr)
  2. services: 192.168.60.250 (A record k8s.bug.tr)

Note: Run as root.

1. Disable swap @all

1
2
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab

2. Install Containerd @all

1
2
apt install containerd -y
systemctl enable --now containerd

3. Set containerd as endpoint for crictl [¹] @all

1
2
3
4
5
6
7
cat >> /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 2
debug: true
pull-image-on-create: false
EOF

4. Enable IPv4 forwarding for pod networking to work [²] @all

1
2
3
4
5
6
7
# sysctl params required by setup, params persist across reboots
cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
EOF

# Apply sysctl params without reboot
sysctl --system

5. Set Containerd Configuration [²] [³] [4] [5] @all

If containerd is installed through:

  1. Docker repositories, config file below defaults to disabling pod network.
  2. Ubuntu 24.04 APT repository, it is blank.
  3. Debian 13 APT repository, it has bin_dir = "/usr/lib/cni" 4. This has to be so to comply with Linux FHS, yet breaks Calico installation unless fixed.

It is best to overwrite the whole file as follows. (One would use jinja2 templates rather than inline editing if this were done in Ansible anyway.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
mkdir -p /etc/containerd -p
cat <<EOF | tee /etc/containerd/config.toml
version = 2
[plugins]
    [plugins."io.containerd.grpc.v1.cri"]
        [plugins."io.containerd.grpc.v1.cri".cni]
        bin_dir = "/opt/cni/bin"
        conf_dir = "/etc/cni/net.d"
    [plugins."io.containerd.grpc.v1.cri".containerd]
        default_runtime_name = "runc"
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
            runtime_type = "io.containerd.runc.v2"
            [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
                SystemdCgroup = true
    [plugins."io.containerd.internal.v1.opt"]
        path = "/var/lib/containerd/opt"
EOF
systemctl restart containerd

6. Install Kubeadm (v1.34.1) [6] @all

apt-get install -y apt-transport-https ca-certificates curl gpg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.34/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.34/deb/ /' | tee /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl
systemctl enable --now kubelet

7. Cache Kubeadm container images @all

Downloads were a bit slow on my part. Backing up the VMs before initializing a cluster saves time on remakes.

1
kubeadm config images pull

8. Set variables in shell @all

Trivia: It seems that Turkish government no longer hands out 3-character .tr domains. If you want to scam some people try getting CORN.TR/corn.tr (I could not :/).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Latest at the time of writing. JQ and cURL can be used to dynamically fetch
export KVVERSION=v1.0.1

export INIT_NODE_HOSTNAME=cp1 # used for if statement later on
export INIT_NODE_ADDR=192.168.60.241

export PORT_CONTROL_PLANE=6443 # default
export DNS_CONTROL_PLANE=k8s-cp.bug.tr
export VIP_ADDR_CP=192.168.60.240
export VIP_SUBNET_CP=32 # A single IP is desired after all
export VIP_INTERFACE_CP=eth0 # uniform across VMs
export POD_CIDR=172.24.0.0/14 # arbitrary
export SVC_CIDR=10.24.0.0/16 # arbitrary

9. Create Static Pod YAML file for Kube-VIP [7] @all

This uses ARP mode, BGP mode would be better suited with a capable router. However, as is, this fits most edge deployments I’ve seen better.

Kube-VIP’s container is being used to generate a valid Static Pod to be spun up by Kubelet. There is no need to initially assign the IP address against my first intuition, as the whole cluster should be able to taken down and rebooted after a disaster recovery.

Note: I had the Virtual IP not assigned whereas etcd, and api-server in a constant CrashLoop. The issue was later resolved by changing containerd config to use systemd/cgroups2. IP should just come up, --leaderElection option works, just not in tandem with --servicesElection (YMMV).

Note: --k8sConfigPath /etc/kubernetes/super-admin.con option is not needed, as this StaticPod is used only for control plane. Service load balancer will be set up after Cluster is brought up.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
mkdir -p /etc/kubernetes/manifests

alias kube-vip="ctr image pull ghcr.io/kube-vip/kube-vip:$KVVERSION; ctr run --rm --net-host ghcr.io/kube-vip/kube-vip:$KVVERSION vip /kube-vip"

kube-vip manifest pod \
    --interface "${VIP_INTERFACE_CP}" \
    --address "${VIP_ADDR_CP}" \
    --vipSubnet "${VIP_SUBNET_CP}" \
    --controlplane \
    --lbPort "${PORT_CONTROL_PLANE}" \
    --arp \
    --leaderElection | tee /etc/kubernetes/manifests/kube-vip.yaml

10. Initialize Cluster [8] @cp1

I made use of AI chat bots to translate my imperative CLI command into a configuration YAML.

Below signs cluster for IP addresses of all 3 nodes, the VIP and the subdomain. If expansion is desired for later, then more need to be added beforehand. Like NOW!

There also is a command embedded as a comment which updates Kubelet configuration from the file. It would need to be run on each node separately with proper cordon/uncordon-ing of course.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
cat > kubeadm-config.yaml <<EOF
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "${INIT_NODE_ADDR}"
  bindPort: ${PORT_CONTROL_PLANE}
nodeRegistration:
  # Run for post-install changes:
  # kubeadm init phase kubelet-start --config ./kubeadm-config.yaml; systemctl restart kubelet
  kubeletExtraArgs:
  - name: "node-ip"
    value: "${INIT_NODE_ADDR}"
  criSocket: "unix:///run/containerd/containerd.sock"
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
kubernetesVersion: "v1.34.1"
controlPlaneEndpoint: "${DNS_CONTROL_PLANE}:${PORT_CONTROL_PLANE}"
networking:
  podSubnet: "${POD_CIDR}"
  serviceSubnet: "${SVC_CIDR}"
  dnsDomain: "${DNS_CONTROL_PLANE}"
apiServer:
  certSANs:
  - "${DNS_CONTROL_PLANE}"
  - "${VIP_ADDR_CP}"
  - "${INIT_NODE_ADDR}"
  - "${INIT_NODE_HOSTNAME}"
  # Add rest of the nodes
EOF
1
2
kubeadm init --upload-certs --config ./kubeadm-config.yaml
export KUBECONFIG=/etc/kubernetes/admin.conf

This should result in an output similiar to the below upon success:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes running the following command on each as root:

  kubeadm join k8s-cp.bug.tr:6443 --token a7wubm.4khburyb6idwzufw \
        --discovery-token-ca-cert-hash sha256:f472af757037dcd39aaa197140229e59829e26448c8b0017a6b79b377b1a224f \
        --control-plane --certificate-key 865cae84e92fb3a69c1a55c2ed38982b0a2a854ecda0d686eee140745077800e

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join k8s-cp.bug.tr:6443 --token a7wubm.4khburyb6idwzufw \
        --discovery-token-ca-cert-hash sha256:f472af757037dcd39aaa197140229e59829e26448c8b0017a6b79b377b1a224f

11. Add Control Plane Nodes @cp2, @cp3

Simply run command given in the kubeadm init’s output. This step can be performed at any point after cluster is initiated. Next steps are not dependent on this.

12. Add Calico to Cluster [9] @cp1

IPTABLES is being used instead of eBPF, due to this being a rather simpler exercise. Cilium seems the better eBPF CNI operator anyway.

1
2
3
4
5
6
7
8
# Add CNI Operator to get node into Ready state
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.31.0/manifests/operator-crds.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.31.0/manifests/tigera-operator.yaml

curl -O https://raw.githubusercontent.com/projectcalico/calico/v3.31.0/manifests/custom-resources.yaml
sed -i "s|cidr: .*|cidr: ${POD_CIDR}|g" custom-resources.yaml # change Pod network CIDR line
kubectl create -f custom-resources.yaml
watch kubectl get tigerastatus # wait until available

Note: Alternatively to changing bin_dir of Containerd (in Step 5), one might be able to change it for Calico by altering installation steps. This has not been tested

1
2
3
4
5
6
7
spec:
  ...
  # Explicit values matching /etc/containerd/config.toml
  cni:
    type: Calico
    binDir: /usr/lib/cni
    confDir: /etc/cni/net.d

13. Install Kube-VIP DaemonSet for Services [10] [11] @cp1

--serviceElection option can be used here as this DaemonSet’s VIP does not create a bootstrapping issue. And --controlplane option is missing for that exact reason.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
kubectl apply -f https://kube-vip.io/manifests/rbac.yaml

export VIP_RANGE_SVC=192.168.60.250-192.168.60.254 # first one is expected to be used only
export VIP_INTERFACE_SVC=eth0 # uniform for VMs, but how to tackle for bare-metal?
export KVVERSION=v1.0.1

alias kube-vip="ctr image pull ghcr.io/kube-vip/kube-vip:$KVVERSION; ctr run --rm --net-host ghcr.io/kube-vip/kube-vip:$KVVERSION vip /kube-vip"

# notice missing --controlplane --taint --address options
kube-vip manifest daemonset \
    --interface "${VIP_INTERFACE_SVC}" \
    --services \
    --inCluster \
    --arp \
    --leaderElection \
    --servicesElection | tee kube-vip-lb-ds.yaml

kubectl create -f kube-vip-lb-ds.yaml

kubectl create configmap -n kube-system kubevip --from-literal "range-global=${VIP_RANGE_SVC}"

kubectl apply -f https://raw.githubusercontent.com/kube-vip/kube-vip-cloud-provider/main/manifest/kube-vip-cloud-controller.yaml

14. Try the Load Balancer @cp1

Follow Kube-VIP’s multi load balancer example.

Bibliography

  1. Github: crictl
  2. k8s.io/docs: Container Runtimes
  3. Reddit: JohnHowardsEyebrows’s comment
  4. LinkedIn: Jeremy Hendricks’ post
  5. Kubernetes Discussion: AndrewLawrence80’s comment
  6. k8s.io/docs: Installing Kubeadm
  7. Kube-VIP: Static Pods
  8. k8s.io/docs: Kubeadm Configuration
  9. Calico: On-Premises
  10. Kube-VIP: DaemonSet
  11. Kube-VIP: Cloud Provider