EKS — Security Groups for Pods
The problem
Containerised applications running in Kubernetes frequently require access to other services running within the cluster as well as external AWS services, such as Amazon RDS or Amazon Elasticache Redis. On AWS, controlling network level access between services is often accomplished via EC2 security groups. Before a few months ago, you could only assign security groups at the node level, and every pod on a node shared the same security groups.
The saviour
Security groups for pods makes it easy to achieve network security compliance by running applications with varying network security requirements on shared compute resources. Network security rules that span pod to pod and pod to external AWS service traffic can be defined in a single place with EC2 security groups, and applied to applications with Kubernetes native APIs.
Considerations
The following are a list of considerations before embarking on this implementation:
- You must be using at least version 1.17 with platform version
eks.3
- Traffic flow to and from pods with associated security groups are not subjected to Calico network policy enforcement and are limited to Amazon EC2 security group enforcement only. This will likely be fixed in the future release of Calico.
- Security groups for pods only work if the nodes themselves are in a private subnet configured with a NAT gateway or instance. Pods with assigned security groups deployed to public subnets are not able to access the internet.
- Kubernetes services of type
NodePort
andLoadBalancer
using instance targets with anexternalTrafficPolicy
set toLocal
are not supported with pods that you assign security groups to. - This feature is only available a selection of instance types within AWS. The full list can be found at the bottom of this page.
The implementation
So the bit you all came to read! How on earth do you implement this?
1. Cluster IAM permissions
Firstly we need to associate the following IAM policy with the IAM role attached to your EKS cluster.
arn:aws:iam::aws:policy/AmazonEKSVPCResourceController
This was done via the following snippet of Terraform code:
resource "aws_iam_role_policy_attachment" "k8s_master_attach" {
for_each = toset(local.cluster_policies)
role = aws_iam_role.k8s_master.name
policy_arn = each.value
}locals {
cluster_policies = [
"arn:aws:iam::aws:policy/AmazonEKSClusterPolicy",
"arn:aws:iam::aws:policy/AmazonEKSServicePolicy",
"arn:aws:iam::aws:policy/AmazonEKSVPCResourceController",
]
}
2. AWS CNI adjustments
The following changes were required to be made to the AWS CNI daemonset.
- As you are likely to be leveraging liveliness and readiness probes within your deployments, we need to set
DISABLE_TCP_EARLY_DEMUX
totrue
so that thekubelet
can connect to pods on branch network interfaces via TCP. - We need to set
ENABLE_POD_ENI
totrue
this will add the labelvpc.amazonaws.com/has-trunk-attached
to the node if it is possible to attach an additional ENI. This is an essential configuration required to enable SGs for pods.
initContainers:
- name: aws-vpc-cni-init
env:
- name: DISABLE_TCP_EARLY_DEMUX
value: "true"containers:
- name: aws-node
env:
- name: ENABLE_POD_ENI
value: "true"
You can validate if these changes have been successful by executing the following commands:
kubectl get nodes -o wide -l vpc.amazonaws.com/has-trunk-attached=true
3. Security Group setup
The configuration of the security groups is the most important step for this whole thing to work.
Let’s take the scenario whereby we have a specific set of pods that require access to a specific RDS database within our setup.
I want to use the diagram below to aid our implementation:
Key:
RQ1 — The pod security group requires outbound internet access.
RQ2 — Our ingress nodes need inbound access to our pod security group to route external traffic.
RQ3 — The pod security group needs outbound access to the RDS security group to allow communication with the database.
RQ4 — The pod security group has to resolve DNS by using the worker security group.
RQ5 — The worker security group needs to have access to the pod security group for liveliness and readiness probes to work.
Terraform code
The following snippets are taken from our rds-mysql
module.
RDS security group
# ==========================
# Security for RDS cluster
# ==========================
resource "aws_security_group" "rds" {
name = "${var.name}"
description = "Default security group for ${var.name}."
vpc_id = var.vpc_id
tags = {
Name = "${var.name}"
Region = data.aws_region.current.name
Platform = var.platform
CreatedBy = "Terraform"
}
}
resource "aws_security_group_rule" "rds_ingress_from_pods" {
description = "Ingress: Allow access from ${aws_security_group.pods.name}."
from_port = 3306
protocol = "tcp"
security_group_id = aws_security_group.rds.id
source_security_group_id = aws_security_group.pods.id
to_port = 3306
type = "ingress"
}
resource "aws_security_group_rule" "rds_ingress_to_self" {
description = "Ingress: Allow access to itself."
self = true
from_port = 3306
protocol = "tcp"
security_group_id = aws_security_group.rds.id
to_port = 3306
type = "ingress"
}
Pod security group
# ==========================
# Security for pods to leverage
# ==========================
resource "aws_security_group" "pods" {
name = "${var.name}-for-pods"
description = "Default security group for ${var.name}-for-pods."
vpc_id = var.vpc_id
tags = {
Name = "${var.name}-for-pods"
Region = data.aws_region.current.name
Platform = var.platform
CreatedBy = "Terraform"
}
}
resource "aws_security_group_rule" "pods_ingress_all_tcp_from_worker_security_group" {
description = "Ingress: Allow all TCP from the worker security group (required for probes to work)."
from_port = 0
protocol = "tcp"
security_group_id = aws_security_group.pods.id
source_security_group_id = var.security_groups["k8s_worker"]
to_port = 65535
type = "ingress"
}
resource "aws_security_group_rule" "pods_ingress_all_tcp_from_ingress_security_group" {
description = "Ingress: Allow all TCP from the ingress security group (required for ingress to work)."
from_port = 0
protocol = "tcp"
security_group_id = aws_security_group.pods.id
source_security_group_id = var.security_groups["k8s_ingress"]
to_port = 65535
type = "ingress"
}
resource "aws_security_group_rule" "pods_egress_to_rds" {
description = "Egress: Allow access to ${aws_security_group.rds.name}."
from_port = 3306
protocol = "tcp"
security_group_id = aws_security_group.pods.id
source_security_group_id = aws_security_group.rds.id
to_port = 3306
type = "egress"
}
resource "aws_security_group_rule" "pods_egress_https_all" {
description = "Egress: HTTPS"
from_port = 443
protocol = "tcp"
security_group_id = aws_security_group.pods.id
cidr_blocks = ["0.0.0.0/0"]
to_port = 443
type = "egress"
}
resource "aws_security_group_rule" "pods_egress_http_all" {
description = "Egress: HTTP"
from_port = 80
protocol = "tcp"
security_group_id = aws_security_group.pods.id
cidr_blocks = ["0.0.0.0/0"]
to_port = 80
type = "egress"
}
resource "aws_security_group_rule" "pods_egress_dns_tcp_to_worker_security_group" {
description = "Egress: Allow DNS resolution (TCP) via worker security group"
from_port = 53
protocol = "tcp"
security_group_id = aws_security_group.pods.id
source_security_group_id = var.security_groups["k8s_worker"]
to_port = 53
type = "egress"
}
resource "aws_security_group_rule" "pods_egress_dns_udp_to_worker_security_group" {
description = "Egress: Allow DNS resolution (UDP) via worker security group"
from_port = 53
protocol = "udp"
security_group_id = aws_security_group.pods.id
source_security_group_id = var.security_groups["k8s_worker"]
to_port = 53
type = "egress"
}
Worker security group updates
resource "aws_security_group_rule" "worker_ingress_dns_tcp_from_pods_security_group" {
description = "Ingress: Allow DNS resolution (via TCP) from ${aws_security_group.pods.name}."
from_port = 53
protocol = "tcp"
security_group_id = var.security_groups["k8s_worker"]
source_security_group_id = aws_security_group.pods.id
to_port = 53
type = "ingress"
}
resource "aws_security_group_rule" "worker_ingress_dns_udp_from_pods_security_group" {
description = "Ingress: Allow DNS resolution (via UDP) from ${aws_security_group.pods.name}."
from_port = 53
protocol = "udp"
security_group_id = var.security_groups["k8s_worker"]
source_security_group_id = aws_security_group.pods.id
to_port = 53
type = "ingress"
}
Conclusion
Every company has their own security and compliance policies, some of which are tightly coupled to security groups. I did find it very easy to configure our clusters to use Security Groups for pods and I don’t believe any engineer will struggle with it. However, this is yet another Kubernetes resource which further expands and effectively complicates various configurations.
The one important gotcha is the need for the worker security group to be able to access the pod security group on any ports used as liveliness and readiness probes. This caught me out for a little while, which is why I have added it here as a mandatory step.