EKS — Security Groups for Pods

Steve Wade
5 min readFeb 13, 2021

--

The problem

Containerised applications running in Kubernetes frequently require access to other services running within the cluster as well as external AWS services, such as Amazon RDS or Amazon Elasticache Redis. On AWS, controlling network level access between services is often accomplished via EC2 security groups. Before a few months ago, you could only assign security groups at the node level, and every pod on a node shared the same security groups.

The saviour

Security groups for pods makes it easy to achieve network security compliance by running applications with varying network security requirements on shared compute resources. Network security rules that span pod to pod and pod to external AWS service traffic can be defined in a single place with EC2 security groups, and applied to applications with Kubernetes native APIs.

Considerations

The following are a list of considerations before embarking on this implementation:

  1. You must be using at least version 1.17 with platform version eks.3
  2. Traffic flow to and from pods with associated security groups are not subjected to Calico network policy enforcement and are limited to Amazon EC2 security group enforcement only. This will likely be fixed in the future release of Calico.
  3. Security groups for pods only work if the nodes themselves are in a private subnet configured with a NAT gateway or instance. Pods with assigned security groups deployed to public subnets are not able to access the internet.
  4. Kubernetes services of type NodePort and LoadBalancer using instance targets with an externalTrafficPolicy set to Local are not supported with pods that you assign security groups to.
  5. This feature is only available a selection of instance types within AWS. The full list can be found at the bottom of this page.

The implementation

So the bit you all came to read! How on earth do you implement this?

1. Cluster IAM permissions

Firstly we need to associate the following IAM policy with the IAM role attached to your EKS cluster.

arn:aws:iam::aws:policy/AmazonEKSVPCResourceController

This was done via the following snippet of Terraform code:

resource "aws_iam_role_policy_attachment" "k8s_master_attach" {
for_each = toset(local.cluster_policies)
role = aws_iam_role.k8s_master.name
policy_arn = each.value
}
locals {
cluster_policies = [
"arn:aws:iam::aws:policy/AmazonEKSClusterPolicy",
"arn:aws:iam::aws:policy/AmazonEKSServicePolicy",
"arn:aws:iam::aws:policy/AmazonEKSVPCResourceController",
]
}

2. AWS CNI adjustments

The following changes were required to be made to the AWS CNI daemonset.

  1. As you are likely to be leveraging liveliness and readiness probes within your deployments, we need to setDISABLE_TCP_EARLY_DEMUX to true so that the kubelet can connect to pods on branch network interfaces via TCP.
  2. We need to setENABLE_POD_ENI to true this will add the label vpc.amazonaws.com/has-trunk-attached to the node if it is possible to attach an additional ENI. This is an essential configuration required to enable SGs for pods.
initContainers:
- name: aws-vpc-cni-init
env:
- name: DISABLE_TCP_EARLY_DEMUX
value: "true"
containers:
- name: aws-node
env:
- name: ENABLE_POD_ENI
value: "true"

You can validate if these changes have been successful by executing the following commands:

kubectl get nodes -o wide -l vpc.amazonaws.com/has-trunk-attached=true

3. Security Group setup

The configuration of the security groups is the most important step for this whole thing to work.

Let’s take the scenario whereby we have a specific set of pods that require access to a specific RDS database within our setup.

I want to use the diagram below to aid our implementation:

Key:

RQ1The pod security group requires outbound internet access.

RQ2Our ingress nodes need inbound access to our pod security group to route external traffic.

RQ3The pod security group needs outbound access to the RDS security group to allow communication with the database.

RQ4The pod security group has to resolve DNS by using the worker security group.

RQ5The worker security group needs to have access to the pod security group for liveliness and readiness probes to work.

Terraform code

The following snippets are taken from our rds-mysql module.

RDS security group

# ==========================
# Security for RDS cluster
# ==========================

resource
"aws_security_group" "rds" {
name = "${var.name}"
description = "Default security group for ${var.name}."
vpc_id = var.vpc_id

tags = {
Name = "${var.name}"
Region = data.aws_region.current.name
Platform = var.platform
CreatedBy = "Terraform"
}
}

resource "aws_security_group_rule" "rds_ingress_from_pods" {
description = "Ingress: Allow access from ${aws_security_group.pods.name}."
from_port = 3306
protocol = "tcp"
security_group_id = aws_security_group.rds.id
source_security_group_id = aws_security_group.pods.id
to_port = 3306
type = "ingress"
}

resource "aws_security_group_rule" "rds_ingress_to_self" {
description = "Ingress: Allow access to itself."
self = true
from_port = 3306
protocol = "tcp"
security_group_id = aws_security_group.rds.id
to_port = 3306
type = "ingress"
}

Pod security group

# ==========================
# Security for pods to leverage
# ==========================

resource
"aws_security_group" "pods" {
name = "${var.name}-for-pods"
description = "Default security group for ${var.name}-for-pods."
vpc_id = var.vpc_id

tags = {
Name = "${var.name}-for-pods"
Region = data.aws_region.current.name
Platform = var.platform
CreatedBy = "Terraform"
}
}

resource "aws_security_group_rule" "pods_ingress_all_tcp_from_worker_security_group" {
description = "Ingress: Allow all TCP from the worker security group (required for probes to work)."
from_port = 0
protocol = "tcp"
security_group_id = aws_security_group.pods.id
source_security_group_id = var.security_groups["k8s_worker"]
to_port = 65535
type = "ingress"
}

resource "aws_security_group_rule" "pods_ingress_all_tcp_from_ingress_security_group" {
description = "Ingress: Allow all TCP from the ingress security group (required for ingress to work)."
from_port = 0
protocol = "tcp"
security_group_id = aws_security_group.pods.id
source_security_group_id = var.security_groups["k8s_ingress"]
to_port = 65535
type = "ingress"
}

resource "aws_security_group_rule" "pods_egress_to_rds" {
description = "Egress: Allow access to ${aws_security_group.rds.name}."
from_port = 3306
protocol = "tcp"
security_group_id = aws_security_group.pods.id
source_security_group_id = aws_security_group.rds.id
to_port = 3306
type = "egress"
}

resource "aws_security_group_rule" "pods_egress_https_all" {
description = "Egress: HTTPS"
from_port = 443
protocol = "tcp"
security_group_id = aws_security_group.pods.id
cidr_blocks = ["0.0.0.0/0"]
to_port = 443
type = "egress"
}

resource "aws_security_group_rule" "pods_egress_http_all" {
description = "Egress: HTTP"
from_port = 80
protocol = "tcp"
security_group_id = aws_security_group.pods.id
cidr_blocks = ["0.0.0.0/0"]
to_port = 80
type = "egress"
}

resource "aws_security_group_rule" "pods_egress_dns_tcp_to_worker_security_group" {
description = "Egress: Allow DNS resolution (TCP) via worker security group"
from_port = 53
protocol = "tcp"
security_group_id = aws_security_group.pods.id
source_security_group_id = var.security_groups["k8s_worker"]
to_port = 53
type = "egress"
}

resource "aws_security_group_rule" "pods_egress_dns_udp_to_worker_security_group" {
description = "Egress: Allow DNS resolution (UDP) via worker security group"
from_port = 53
protocol = "udp"
security_group_id = aws_security_group.pods.id
source_security_group_id = var.security_groups["k8s_worker"]
to_port = 53
type = "egress"
}

Worker security group updates

resource "aws_security_group_rule" "worker_ingress_dns_tcp_from_pods_security_group" {
description = "Ingress: Allow DNS resolution (via TCP) from ${aws_security_group.pods.name}."
from_port = 53
protocol = "tcp"
security_group_id = var.security_groups["k8s_worker"]
source_security_group_id = aws_security_group.pods.id
to_port = 53
type = "ingress"
}

resource "aws_security_group_rule" "worker_ingress_dns_udp_from_pods_security_group" {
description = "Ingress: Allow DNS resolution (via UDP) from ${aws_security_group.pods.name}."
from_port = 53
protocol = "udp"
security_group_id = var.security_groups["k8s_worker"]
source_security_group_id = aws_security_group.pods.id
to_port = 53
type = "ingress"
}

Conclusion

Every company has their own security and compliance policies, some of which are tightly coupled to security groups. I did find it very easy to configure our clusters to use Security Groups for pods and I don’t believe any engineer will struggle with it. However, this is yet another Kubernetes resource which further expands and effectively complicates various configurations.

The one important gotcha is the need for the worker security group to be able to access the pod security group on any ports used as liveliness and readiness probes. This caught me out for a little while, which is why I have added it here as a mandatory step.

--

--