How SSO bought a cluster to its knees!
Setting the scene
So imagine you have a cluster with publically accessible endpoints and you’re using a mixture of NGINX Ingress Controller and OAuth2 Proxy to provide Single Sign-On capabilities to these endpoints.
The flow for accessing these endpoints is as follows:
So, what happened?
We had a cascading failure due to an exceptionally high number of Single Sign-On requests by somebody performing a dictionary attack on our endpoints.
For every request made to Dex, a new resource was created in the Kubernetes cluster (called an
authrequest), this was because we were using Kubernetes as a backend store for Dex. At the time of the outage, there were approximately 4 million
authrequest resources in the Kubernetes cluster.
Additionally, we have a number of flux instances deployed into our Kubernetes clusters which reconcile state. By default, Flux reconciles the state of the whole cluster (including all the
authrequest resources) every 30 seconds.
This in turn was placing an incredible amount of a load of our Kubernetes API servers, which could not handle the load and fell over.
What were our mitigation steps?
We have implemented a number of mitigation steps but I really want to highlight two as I think they are interesting:
Using etcd as a Dex backend rather than Kubernetes
We have migrated away from using Kubernetes as a backend store for Dex and instead moved to using an etcd cluster. We decided to use the Bitnami chart (https://github.com/bitnami/charts/tree/master/bitnami/etcd) to handle the deployment of this.
We randomly generated the etcd passwords and stored them as
SealedSecrets in our
kubernetes-secrets repository, updated the Dex deployment to use the secret as an environment variable. Finally, we changed the Dex configuration to look like below:
Not only has this stopped resources (such as
authrequests) from being deployed into the Kubernetes etcd cluster, but, it also improved the responsiveness of
If you aren’t using etcd as a storage backend for Dex, myself and the Mettle Platform team would highly recommend it.
Reducing what flux reconciles
The final change I would like to talk about is leveraging the
—-k8s-unsafe-exclude-resource flag within flux. This flag will tell Flux to avoid querying the cluster for those resources. This in turn means that Flux won’t take into account those excluded cluster resources when syncing. For more information on this flag see here.
We have the following set (I have wrapped to make it easier to read)
This will now exclude all resources in the
dex.coreos.com API group from being queried which in turn greatly reduces the load on the Kubernetes API server.