Rebuilding pod-to-pod networking in my k8s cluster "Sea-Computing"

A while ago, I wrote about my self-built kubernetes cluster. One interesting detail of it, was the lack of a CNI plugin, having configured the routes for pod-to-pod networking statically, via ansible.

This worked great. I just had to configure the IP ranges (v4 and v6) in the docker daemon and configure routes in interface-up scripts, which I had anyway. In the beginning, I had a tinc tunnel between the nodes, which I migrated to wireguard later and even later changed to vxlan on wireguard (to allow dynamic routing with BGP, which I use for example for metallb). But docker support is deprecated in kubernetes, so I had to change runtime. Both real alternatives, containerd and cri-o, depend on having a CNI network config, so changing to CNI was required.

I started with modifying my ansible playbook to push the generated Pod network ranges for each node into the Nodes definition (.spec.PodCIDRs), retrieve the default CNI plugins and place them at /opt/cni/bin. After that, I applied the flannel manifest, which I modified to use host-gw mode (since I already have a vxlan I use for other purposes too (and which was initially created for k8s, so it’s not a misuse of the existing one)).

After I had flannel running on my cluster and rebooted all nodes gracefully, I migrated from docker to cri-o. My cluster even felt more stable after that change, probably (in part) due to not having two cgroup management things running at the same time - docker and kubelet were configured for cgroupfs, while all nodes run debian, so systemd. With the change to cri-o, I also configured kubelet to use systemd as cgroup driver.

If the simple solutions were perfect, this post could stop now - but, oh well, flannel does not yet support dual-stack network (it’s added by now, but not yet released), so I didn’t have IPv6 anymore in my containers, resulting in a lot of Instance down! from my monitoring, since I have some IPv6-only services to monitor - Also, I have my own /48 IPv6 PI allocation (PI: provider independent, can route via any upstream provider I have a connection to) allocation, from which I want to use a portion as LoadBalancer IPs and not having IPv6 just isn’t an option at all anymore (did you see the prices for IPv4? I wanted to pay some money for some, but some money just isn’t enough O_O).

Being the most calm and patient person in existence (*ahem*), I decided to build my own small program for that. Grabbing each Node’s IPs and adding routes to the other Node’s pod networks seemed like a simple enough thing to do. And I succeeded - I built this in (mostly) two days, which weren’t even used fully for that (after all, I’m still in hospital and therapy right now) and deployed it - replacing flannel in host-gw node completely.

It’s important to note, my kube-hostgw isn’t a CNI plugin in itself, it only generates a CNI network config based on the default plugins, namely bridge, portmap and host-local. The bridge plugin combined with host-local for IP address allocation does what docker does by default: create a Linux bridge interface, give it the first IP of the given host-local range, create a veth pair for every container with the host-side being added to the bridge and the container side being configured with another IP from the range given to host-local. portmap is required for Kubernetes services of type NodePort - I don’t (yet) know how that plugin works exactly, where it gets its infos from - but it was listed as dependency of flannel (and the flannel CNI config used it) and everything works.

With kube-hostgw not being a CNI plugin itself, the stability of it is not very important - it only needs to be running in a loop if Nodes get added, removed or new PodCIDRs allocated - besides that, it could be built as a Job instead of a DaemonSet (but sadly there isn’t a JobSet - one Job per Node matching some rules).

Best thing, after I deployed kube-hostgw to my cluster (I still only have this one, so my testing environment is sadly the same as my prod environment ^^’), one of my girlfriends was also building a kubernetes cluster, to move all their stuff into it. They, too, started with flannel (I think it was my recommendation) and migrated to kube-hostgw once it was usable (so.. even before it had a version number, not even a LICENSE or README). She used kube-hostgw in her prod cluster from the start - so it’s running in three clusters already, just doing its job :3 She also wrote about it (mostly her cluster, a bit about kube-hostgw and quite a lot about how friendly the kubernetes community is) - check out her post, too!

If you want to read more about kube-hostgw, take a look at the project in my Gitlab instance :)

Fox' blog

Main content

Rebuilding pod-to-pod networking in my k8s cluster "Sea-Computing"