AKS node reboot after security patches using Kured

Part of Microsoft’s responsibility in regards to Azure Kubernetes Service maintenance, is to install security updates on the nodes. This is done overnight. Some of these updates however require a reboot to finalize and part of your responsibility as user is to make sure that your nodes are rebooted.

You can automate this reboot process using a reboot daemon called Kured.
In the context of AKS with Linux nodes you have to install the daemon set onto your cluster and it automatically starts probing for a file /var/run/reboot-required on the nodes. This file is produced by the package manager upon issuing system updates that require restarts.

For more info on how to tweak it see here.

You can test that it works by SSHing onto a node and touching the previous file ie. /var/run/reboot-required onto there. This should trigger a restart of said node, first cordoning the pods from that particular node and once they’re running on another node doing the actual restart of the node.

One thing to note is that Kured seems to cause issues with 1 node clusters (yes, I know you should have at least 3 nodes but sometimes for dev setups you simply don’t) – it starts cordoning the pods but has nowhere to put them to so it renders the whole cluser in an unusable state.

Leave a Reply

Your email address will not be published. Required fields are marked *