A while ago, I wrote about my self-built kubernetes cluster. One interesting detail of it, was the lack of a CNI plugin, having configured the routes for pod-to-pod networking statically, via ansible.
This worked great. I just had to configure the IP ranges (v4 and v6) in the docker daemon and configure routes
in interface-up scripts, which I had anyway. In the beginning, I had a tinc tunnel between the nodes, which
I migrated to wireguard later and even later changed to vxlan on wireguard (to allow dynamic routing with
BGP, which I use for example for metallb). But docker support is deprecated in kubernetes, so I had to change
runtime. Both real alternatives, containerd and cri-o, depend on having a CNI network config, so changing to
CNI was required.
I started with modifying my ansible playbook to push the generated Pod network ranges for each node into the Nodes
definition (.spec.PodCIDRs), retrieve the default CNI plugins and place them at /opt/cni/bin. After that, I
applied the flannel manifest, which I modified to use host-gw mode (since I already have a vxlan I use for other
purposes too (and which was initially created for k8s, so it’s not a misuse of the existing one)).
After I had flannel running on my cluster and rebooted all nodes gracefully, I migrated from docker to cri-o.
My cluster even felt more stable after that change, probably (in part) due to not having two cgroup management
things running at the same time - docker and kubelet were configured for cgroupfs, while all nodes run debian,
so systemd. With the change to cri-o, I also configured kubelet to use systemd as cgroup driver.
If the simple solutions were perfect, this post could stop now - but, oh well, flannel does not yet support
dual-stack network (it’s added by now, but not yet released), so I didn’t have IPv6 anymore in my containers,
resulting in a lot of Instance down! from my monitoring, since I have some IPv6-only services to monitor -
Also, I have my own /48 IPv6 PI allocation (PI: provider independent, can route via any upstream provider I have
a connection to) allocation, from which I want to use a portion as LoadBalancer IPs and not having IPv6 just isn’t
an option at all anymore (did you see the prices for IPv4? I wanted to pay some money for some, but some money just
isn’t enough O_O).
Being the most calm and patient person in existence (*ahem*), I decided to build my own small program for that.
Grabbing each Node’s IPs and adding routes to the other Node’s pod networks seemed like a simple enough thing to
do. And I succeeded - I built this in (mostly) two days, which weren’t even used fully for that (after all, I’m
still in hospital and therapy right now) and deployed it - replacing flannel in host-gw node completely.
It’s important to note, my kube-hostgw isn’t a CNI plugin in itself, it only generates a CNI network config
based on the default plugins, namely bridge, portmap and host-local. The bridge plugin combined with
host-local for IP address allocation does what docker does by default: create a Linux bridge interface, give
it the first IP of the given host-local range, create a veth pair for every container with the host-side being
added to the bridge and the container side being configured with another IP from the range given to host-local.
portmap is required for Kubernetes services of type NodePort - I don’t (yet) know how that plugin works
exactly, where it gets its infos from - but it was listed as dependency of flannel (and the flannel CNI config
used it) and everything works.
With kube-hostgw not being a CNI plugin itself, the stability of it is not very important - it only needs to be
running in a loop if Nodes get added, removed or new PodCIDRs allocated - besides that, it could be built as a
Job instead of a DaemonSet (but sadly there isn’t a JobSet - one Job per Node matching some rules).
Best thing, after I deployed kube-hostgw to my cluster (I still only have this one, so my testing environment is
sadly the same as my prod environment ^^’), one of my girlfriends was also building a kubernetes cluster, to move
all their stuff into it. They, too, started with flannel (I think it was my recommendation) and migrated to
kube-hostgw once it was usable (so.. even before it had a version number, not even a LICENSE or README). She used
kube-hostgw in her prod cluster from the start - so it’s running in three clusters already, just doing its job :3
She also wrote about it (mostly her cluster, a bit about kube-hostgw and quite a lot about how friendly the
kubernetes community is) - check out her post, too!
If you want to read more about kube-hostgw, take a look at
the project in my Gitlab instance :)