How SSO bought a cluster to its knees!
--
Setting the scene
So imagine you have a cluster with publically accessible endpoints and you’re using a mixture of NGINX Ingress Controller and OAuth2 Proxy to provide Single Sign-On capabilities to these endpoints.
The flow for accessing these endpoints is as follows:
So, what happened?
We had a cascading failure due to an exceptionally high number of Single Sign-On requests by somebody performing a dictionary attack on our endpoints.
For every request made to Dex, a new resource was created in the Kubernetes cluster (called an authrequest
), this was because we were using Kubernetes as a backend store for Dex. At the time of the outage, there were approximately 4 million authrequest
resources in the Kubernetes cluster.
Additionally, we have a number of flux instances deployed into our Kubernetes clusters which reconcile state. By default, Flux reconciles the state of the whole cluster (including all the authrequest
resources) every 30 seconds.
This in turn was placing an incredible amount of a load of our Kubernetes API servers, which could not handle the load and fell over.
What were our mitigation steps?
We have implemented a number of mitigation steps but I really want to highlight two as I think they are interesting:
Using etcd as a Dex backend rather than Kubernetes
We have migrated away from using Kubernetes as a backend store for Dex and instead moved to using an etcd cluster. We decided to use the Bitnami chart (https://github.com/bitnami/charts/tree/master/bitnami/etcd) to handle the deployment of this.
We randomly generated the etcd passwords and stored them as SealedSecrets
in our kubernetes-secrets
repository, updated the Dex deployment to use the secret as an environment variable. Finally, we changed the Dex configuration to look like below:
storage:
type: etcd
config:
endpoints:
- etcd.oidc.svc.cluster.local:2379
username: root
password: $ETCD_ROOT_PASSWORD
Not only has this stopped resources (such as authrequests
) from being deployed into the Kubernetes etcd cluster, but, it also improved the responsiveness of kubectl
significantly.
If you aren’t using etcd as a storage backend for Dex, myself and the Mettle Platform team would highly recommend it.
Reducing what flux reconciles
The final change I would like to talk about is leveraging the —-k8s-unsafe-exclude-resource
flag within flux. This flag will tell Flux to avoid querying the cluster for those resources. This in turn means that Flux won’t take into account those excluded cluster resources when syncing. For more information on this flag see here.
We have the following set (I have wrapped to make it easier to read)
*metrics.k8s.io/*,
webhook.certmanager.k8s.io/*,
v1/Event,
*dex.coreos.com/*
This will now exclude all resources in the dex.coreos.com
API group from being queried which in turn greatly reduces the load on the Kubernetes API server.