Speed Up Deployments to Amazon ECS with a Multi-Container Approach
How to streamline a deployment pipeline being bogged down by container bloat
Our use case
All developers have been there. They fire up their development pipeline for the 4th time in a row just to test a small change that is failing in one of the last steps. Unfortunately, the base container image used in this pipeline has grown bloated and now builds are taking significant time. It would be nice to move some of the static files out of the application container and to a backend storage, but management has made it clear that performance and cost savings are paramount.
In an effort to alleviate this pain from a particular development group, a DevOps team I worked with was able to leverage existing capabilities within Amazon Elastic Container Service (ECS). The idea was to divorce the static files from the application code, but in a way that kept the static files local to the application.
(It should be noted that this works best with Fargate as the backend for the containers, but may still be applicable with EC2 backends)
How ECS allows this
When creating ECS Task Definitions, it is generally a good idea to keep it simple. Even AWS best practices recommend avoiding use of multiple containers per Task. However, the main advantage (both containers running on the same physical machine) is something we can leverage.
If we take all static files that rarely change from release to release and move them into a separate "data" container, we can now keep our application lean. And since ECS natively supports multi-container Task Definitions, the setup is simple.
Dividing the original container
Say your original application Dockerfile looks something like this:
To change the definition, we simply need to add the "data" container and define the volume locations:
{"taskDefinitionArn":"arn:aws:ecs:us-east-1::task-definition/multi-container-task:2","containerDefinitions":[{"name":"nginx","image": <your lean application image>,"portMappings":[{"name":"nginx-80-tcp","containerPort":80,"hostPort":80,"protocol":"tcp","appProtocol":"http"}],"essential":true,"environment":[],"environmentFiles":[],"mountPoints":[],"volumesFrom":[{"sourceContainer":"data_source","readOnly":false}],"dependsOn":[{"containerName":"data_source","condition":"SUCCESS"}],"ulimits":[],"logConfiguration":{"logDriver":"awslogs","options":{"awslogs-group":"/ecs/","awslogs-create-group":"true","awslogs-region":"us-east-1","awslogs-stream-prefix":"ecs"},"secretOptions":[]},"systemControls":[]},{"name":"data_source","image": <your data image>,"portMappings":[],"essential":false,"environment":[],"environmentFiles":[],"mountPoints":[{"sourceVolume":"nginx_data_source","containerPath":"/usr/share/nginx/html","readOnly":false}],"volumesFrom":[],"systemControls":[]}],"family":"multi-container-task","taskRoleArn":"arn:aws:iam:::role/ecs-task-ecr-read","executionRoleArn":"arn:aws:iam:::role/ecs-task-ecr-read","networkMode":"awsvpc","revision":2,"volumes":[{"name":"nginx_data_source","host":{}}],"status":"ACTIVE","requiresAttributes":[{"name":"com.amazonaws.ecs.capability.logging-driver.awslogs"},{"name":"ecs.capability.execution-role-awslogs"},{"name":"com.amazonaws.ecs.capability.docker-remote-api.1.19"},{"name":"com.amazonaws.ecs.capability.ecr-auth"},{"name":"com.amazonaws.ecs.capability.task-iam-role"},{"name":"ecs.capability.container-ordering"},{"name":"ecs.capability.execution-role-ecr-pull"},{"name":"com.amazonaws.ecs.capability.docker-remote-api.1.18"},{"name":"ecs.capability.task-eni"},{"name":"com.amazonaws.ecs.capability.docker-remote-api.1.29"}],"placementConstraints":[],"compatibilities":["EC2","FARGATE"],"requiresCompatibilities":["FARGATE"],"cpu":"256","memory":"1024","runtimePlatform":{"cpuArchitecture":"X86_64","operatingSystemFamily":"LINUX"},"tags":[]}
When this new task runs, the following occurs:
The app container waits for the "data" container to be available
Once the "data" container is there, the app will mount the shared volume locally
Since the "data" container is not running a command, it gracefully exits and leaves all Task resources to the application container
What do we gain?
The main advantage is that our application should be much leaner, allowing for code changes to build faster. While dealing with small to medium images, a user is unlikely to notice a difference, here are some example times once we start increasing that size.
Size
Build time
150 MB
8.73s
750 MB
44.66s
2.25 GB
147.53s
Of course, each project should be evaluated separately based on their own build times and build frequencies. But if your static files are a significant portion of the container image, the cumulative effect can be game changing.
What is the cost?
First, let us mention what we are not sacrificing: application performance or scale-up time.
Here is an example of the results of load tests against the two setups, and regardless of file size, there was never an appreciable difference:
For scale-up times, the multi-container setup showed no difference regardless of the size of the static files (tested from 80mb to 300mb)
The main cost comes from the added deployment complexity. There will now need to be a pipeline that has two repositories as inputs and several design questions follow from that:
How do you match environments in the pipelines? Does an application Staging environment use the "data" from Staging or Production?
Do you trigger full builds for changes to the "data" container? Do you have a different set of acceptance criteria?
Do we separate the "data" across multiple volumes? Does each volume get its own repository?
How these questions get answered can vary from project to project.
What about alternatives?
There are other ways to keep the application container lean, but this multi-container method compares favorably to those.
Mounting the static files with an EFS mount
With an EFS mount your entire ECS Task can be pared down to a lean container image, which would maximize the speed of Task scale-up. However, your Tasks now require reaching over a network to access their static files, which introduces I/O latency and transfer costs. Additionally, EFS itself is an opaque data source which requires extra steps to view and update files. It has no built-in versioning/history like data in a repository would.
Using S3 as a store for static files
An S3 solution would require one of the following approaches:
pulling the static files into the application on demand, which would perform worse than EFS
redirecting to S3 directly, which requires a public facing bucket and is generally less flexible for coding the application
Wrap-up
Though this approach does require some refactoring of the application and build process, the downsides are limited. Ultimately, the scenario where this approach would add the most value is having a pipeline fast enough that our developers are watching the build actively instead of a pipeline that has them context switching to answering emails or other distractions. Even our DevOps processes must compete in the attention economy!
Latest Articles
Read more about the latest and greatest work Rearc has been up to.