The problem

Containerised applications running in Kubernetes frequently require access to other services running within the cluster as well as external AWS services, such as Amazon RDS or Amazon Elasticache Redis. On AWS, controlling network level access between services is often accomplished via EC2 security groups. Before a few months ago, you could only assign security groups at the node level, and every pod on a node shared the same security groups.

The saviour

Security groups for pods makes it easy to achieve network security compliance by running applications with varying network security requirements on shared compute resources. …


Problem statement

The problem statement for this piece of work was as follows:

As a platform engineer
I want our workloads to be as secure as possible
So that we don’t leave ourselves open to explotation

Recommended Tooling

A slight tangent, but I would recommend using the following tooling to help secure your cluster workloads.

I am planning to write another blog post on how we use kubeaudit above as part of our CI process.

Note: Their are many more but these are a good starting point.

The Helm Operator Problem

At Mettle we run a helm operator per namespace in each of our…


Setting the scene

So imagine you have a cluster with publically accessible endpoints and you’re using a mixture of NGINX Ingress Controller and OAuth2 Proxy to provide Single Sign-On capabilities to these endpoints.

The flow for accessing these endpoints is as follows:


Problem statement

The problem statement for this piece of work was as follows:

As a platform engineer
I want to lock down flux permissions to “just enough”
So that we keep the cluster as secure as possible

What is Flux?

Flux is a tool that automatically ensures that the state of a cluster matches the config in git. It uses an operator in the cluster to trigger deployments inside Kubernetes, which means you don’t need a separate CD tool.

It monitors all relevant image repositories, detects new images, triggers deployments, and updates the desired running configuration based on that (and a configurable policy).

For more…


After Duffie’s TGIK episode a few weeks ago (see below) about Etcd, I thought it was only right for me to blog about how we implemented a resilient Etcd cluster at Mettle.

Problem statement

The problem statement for this piece of work was as follows:

As a platform engineer
I want to be confident in the self-healing nature of our Etcd cluster
So that it automatically heals without human intervention

The Problem

The Platform team’s aim at Mettle is for our platform to be able to automatically tolerate the failure of services, servers, and even whole parts of the system with no impact…


Feature statement

Providing the ability to snapshot Elasticsearch clusters to S3 on a regular cadence. For reference, we are using the managed Elasticsearch service in AWS and are currently on Elasticsearch version 6.4.2. We decided to write a Golang command-line tool to perform the snapshotting,

Our proposed solution / problem

However, we wanted to validate our tool locally before having to deploy it to our Kubernetes cluster in AWS.

After doing some searching online, my colleague, Will Varney came across localstack (https://github.com/localstack/localstack).

LocalStack — A fully functional local AWS cloud stack

LocalStack provides an easy-to-use test/mocking framework for developing Cloud applications. …


Problem statement

The problem statement for this piece of work was as follows:

As an engineer
I want to be confident in my HelmRelease before they are deployed.
So that deploys are more likely to succeed than fail in Kubernetes.

The current problem

At Mettle, as discussed before, we follow GitOps principles religiously when deploying workloads to our Kubernetes clusters. However, in the past, we have been bitten with invalid HelmRelease manifests being deployed causing tiller to swallow errors and not bubble them to the surface.

The migration to Helm 3 made these errors appear when describing the HelmRelease itself (see below)

Status: Conditions: Last…


Problem Statement

The problem statement for this piece of work was as follows:

As a platform engineer
I want new chart versions to be available as quickly as possible across all envs.
So that HelmReleases don’t fail on startup because the version does not exist.
And dependencies on HelmReleases are kept outside of the cluster.

Current Implementation

Before this work took place we used to build a helm registry container as part of merging helm chart changes into the master branch of k8s-helm-charts.

We then had a deployment inside each of our clusters which has the following flux annotation applied to it:

annotations: flux.weave.works/automated…


Even though the above was made public this week, the Platform Team at Mettle always wants to be ahead of the curve. Therefore, the remainder of this blog post will talk about our journey to upgrade our clusters to v3.

Preparation

The following sections detail the prep work required before starting to upgrade.

HelmRelease: Default all to v2

As preparation, we made sure that all our HelmReleases specified the helm version they use, this was done by adding the following:

spec:
chart:
name: backend
repository: https://example.storage.googleapis.com
releaseName: account-balance
helmVersion: v2

Helm Operator: Allow both v2 and v3 charts

We need to set the following options on our helm operator instance so that it’s…


One of the Platform team's biggest principles is making our platform self-service to engineers. GitOps has allowed us to make this possible.

At Mettle we fully leverage GitOps to deploy everything into our clusters, we chose to use Flux CD (https://github.com/fluxcd/flux) as our GitOps controller of choice.

Flux is a tool that automatically ensures that the state of a cluster matches the config in git. It uses an operator in the cluster to trigger deployments inside Kubernetes, which means you don’t need a separate CD tool. …

Steven Wade

Independent Kubernetes Consultant & Trainer.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store