Localstack and two other containers!
Feature statement
Providing the ability to snapshot Elasticsearch clusters to S3 on a regular cadence. For reference, we are using the managed Elasticsearch service in AWS and are currently on Elasticsearch version 6.4.2. We decided to write a Golang command-line tool to perform the snapshotting,
Our proposed solution / problem
However, we wanted to validate our tool locally before having to deploy it to our Kubernetes cluster in AWS.
After doing some searching online, my colleague, Will Varney came across localstack (https://github.com/localstack/localstack).
LocalStack — A fully functional local AWS cloud stack
LocalStack provides an easy-to-use test/mocking framework for developing Cloud applications. Currently, the focus is primarily on supporting the AWS cloud stack.
I would also recommend leveraging awscli-local
(https://github.com/localstack/awscli-local) when using localstack as it bypasses the need to have to mess around with your awscli
to interact.
This package provides the
awslocal
command, which is a thin wrapper around theaws
command-line interface for use with LocalStack.
Leveraging docker-compose
Will and I then set about constructing a docker-compose
file that would allow us to simulate having both an Elasticsearch cluster and an S3 bucket running locally.
Initial attempt
The belowdocker-compose.yml
file shows both the elasticsearch
and localstack
configuration.
version: '3'
services:
elasticsearch:
build: .
container_name: elasticsearch
volumes:
- ./es.yml:/usr/share/elasticsearch/config/elasticsearch.yml
ports:
- 9200:9200
environment:
discovery.type: single-node
links:
- "localstack:localstack"
restart: always
localstack:
image: localstack/localstack:0.11.0
container_name: localstack
ports:
- '4563-4599:4563-4599'
- '8055:8080'
environment:
- SERVICES=s3
- DEFAULT_REGION=eu-west-2
- DATA_DIR=/tmp/localstack/data
- AWS_ACCESS_KEY_ID=1234
- AWS_SECRET_ACCESS_KEY=1234
volumes:
- './.localstack:/tmp/localstack'
You will notice above that elasticsearch is being built from a Dockerfile
rather than just specifying an image, the Dockerfile
consists of the open-source image for Elasticsearch 6.4.2 plus the installation of the s3 repository plugin to allow us to snapshot to S3 (see below).
FROM docker.elastic.co/elasticsearch/elasticsearch-oss:6.4.2
RUN /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch repository-s3
The file piece of the puzzle is the configuration of Elasticsearch via the elasticsearch.yml
file, the contents of this can be seen below:
cluster.name: "docker-cluster"
network.host: 0.0.0.0
# Configuration to allow for local S3 (for localstack)
s3.client.default.endpoint: localstack:4572
s3.client.default.protocol: http
Note: Port 4572
is the port which localstack
provides for access to S3, this can be found in the overview section of their README file.
Everything looked good until we actually came to interact with our S3 bucket from within Elasticsearch itself. We were getting a 401 Unauthorized error.
The journey to a 200
The remainder of this post talks about the changes required to get a successful connection between Elasticsearch and S3.
The need to leverage the elasticsearch keystore
The client that you use to connect to S3 has a number of settings available. The settings have the form s3.client.CLIENT_NAME.SETTING_NAME
. By default, s3
repositories use a client named default
, but this can be modified using the repository setting client
.
Most client settings can be added to the elasticsearch.yml
configuration file with the exception of the secure settings, which you add to the Elasticsearch keystore. For more information about creating and updating the Elasticsearch keystore, see Secure settings.
For example, if you want to use specific credentials to access S3 then run the following commands to add these credentials to the keystore:
bin/elasticsearch-keystore add s3.client.default.access_key
bin/elasticsearch-keystore add s3.client.default.secret_key
bin/elasticsearch-keystore add s3.client.default.session_token
Therefore we added these to our Elasticsearch Dockerfile, but this was to no avail. The reason for this was because the Keystore was wiped after the elasticsearch binary in the container was instantiated.
Will and I then decided to exec
into the running container, execute the keystore add
commands locally, and then use the docker cp
command to copy the Keystore file from the container to our local machine.
The values used for the keystore commands need to match the
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
specified in the localstack configuration within the docker-compose.yml file.
We could then mount the keystore inside the container at runtime which meant it wouldn’t be over-ridden. So now the elasticsearch section of the Dockerfile
looked like below:
elasticsearch:
build: .
container_name: elasticsearch
volumes:
- ./es.yml:/usr/share/elasticsearch/config/elasticsearch.yml
-
./es.keystore:/usr/share/elasticsearch/config/elasticsearch.keystore
ports:
- 9200:9200
environment:
discovery.type: single-node
links:
- "localstack:localstack"
restart: always
Good old Docker networking!
As you saw earlier in this post we were using localstack:4572
as the default endpoint for Elasticsearch to use when talking to S3 (see below):
# Configuration to allow for local S3 (for localstack)
s3.client.default.endpoint: localstack:4572
s3.client.default.protocol: http
After further reading the documentation we came across the following:
There are a number of storage systems that provide an S3-compatible API, and the
repository-s3
plugin allows you to use these systems in place of AWS S3. To do so, you should set thes3.client.CLIENT_NAME.endpoint
setting to the system’s endpoint. This setting accepts IP addresses and hostnames and may include a port. For example, the endpoint may be172.17.0.2
or172.17.0.2:9000
.
Therefore we decided to specifically give both Elasticsearch and localstack dedicated IP addresses to communicate with each other.
We did this as we needed to use path-based access to the s3 bucket similar to what's discussed here. Otherwise, it uses DNS which fails as we’re not actually using s3. Specifying an IP address changes the underlying AWS Java SDK the plugin is using to use path-based.
To configure this, we added a networks
configuration option within docker-compose. We added the section below to the bottom of our docker-compose.yml
file.
networks:
testing_net:
ipam:
driver: default
config:
- subnet: 172.28.0.0/16
For more information on docker-compose
networking see https://docs.docker.com/compose/networking/#specify-custom-networks.
We then had to hard-code the IP addresses for each of the containers (see below)
elasticsearch:
build: .
container_name: elasticsearch
...
networks:
testing_net:
ipv4_address: 172.28.1.1
localstack:
image: localstack/localstack:0.11.0
container_name: localstack
...
networks:
testing_net:
ipv4_address: 172.28.1.2
The final piece of the puzzle was to hard-code the default s3 endpoint in the elasticsearch configuration to be the below:
s3.client.default.endpoint: 172.28.1.2:4572
After making these sets of changes, elasticsearch was finally able to successfully communicate with our S3 bucket running on localstack. Meaning we could finally have a local test rig to code/prototype against!
Summary
After locating the solutions to these problems we now have a local setup of Elasticsearch communicating with S3 which we could prototype against without needing to deploy our Go code into AWS for testing.
This reduced our feedback loop significantly and allowed us to ship the snapshotter in just over two days (we wasted around half a day figuring out the answer to the 401 unauthorized errors).
If you need to write code that interacts with AWS I can highly recommend leveraging localstack for local development.