What is Containerization, and Why Now Is the Time to Adopt It
It’s hard to find a current job listing for a software operations position that doesn’t contain words like “Docker”, “Terraform”, or “Kubernetes”. Containers, shortly summarized as a newly popular way to package and deploy software in isolated packages, have become the default way in which many new companies handle their software operations in a now cloud native era of computing.
Containerization has been on an explosive upward trajectory since 2013, when the foremost containerization software Docker was first open sourced¹. Cloud computing, which may have once been seen as only the saving grace of small companies unable to run their own data centers, now has become so flexible that even larger companies can justify its cost. Docker has quickly become the uncontested standard of container software, thanks to support from its large benefactors like Google, leading to much more of a consensus across companies of how to run a containerized build system. Now Kubernetes and infrastructure-as-code tools can build entire software companies from the ground up entirely through just a package of YAML files, earning early adopters of container software a previously unimaginable magnitude of speed and flexibility.
Defining Containers
Docker itself labels containers as “a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.” Containers behave as miniature self-contained computers that are built specifically for hosting certain applications, rather than being general purpose like your own personal use computer, and because of this they can be specifically tailored to only meeting that applications needs. This solves many concurrency problems like race conditions and resource exhaustion by giving each application a specific platform custom built to handle what should be its maximum required resources without contesting applications.
Each container is substantiated from an image, a central source of truth that can be shared between users that defines the contents of a created container. Containers are stored in online registries like the Elastic Container Repository hosted by AWS, which allow developers to push new versions of their changes over time and rollback to previous images as errors are discovered. Commands can be executed within the container itself, but unless the changes are pushed to a repository the next container will start in the same exact state as the previous container. Having such a great amount of standardization is not only useful from an operations point of view, but also fantastic for providing a base from which your application can be deployed without needing a user to import their own dependencies.
In order to leverage containers for operations purposes, many choose to use Kubernetes Orchestration Software, which further abstracts containers by allowing them to be deployed systematically across many different pieces of hardware. Kubernetes is the de facto container orchestration software tool, transforming containers into a higher level of abstraction that can be deployed to achieve high reliability and dynamic scaling, especially when leveraging cloud computing. The basic unit of deployment in Kubernetes is called a pod, which encapsulates not only the containers responsible for hosting applications but also basic logging, the ability to restart if containers stop functioning, the use of Persistent Volume Claims to establish hard drives for permanent storage independent of the containers being used, and much more.
How We Got Here
The idea of partitioning and standardizing computers has been around for nearly as long as the modern computer, enabled by the idea of Compatible Time-Sharing System pioneered by MIT as early as 1961 when multiple users were given the illusion of being able to use the same computer concurrently by switching back and forth between commands given by each user. By 1966 this enabled the virtualization of another complete operating system, forming an abstract imitation of a computer within a computer. This would eventually coalesce decades later into the first efficient hardware-assisted virtualization platform released by VMWare in 1999, allowing the complete isolation of the “guest” virtualized operating system from its host.
There were many practical reasons for the advancement of virtual machines. Computers quickly gain complexity as dependencies change over time, creating massive amounts of technical debt when changes cannot be accurately reproduced or recorded. Without restarting entirely to the original OS, it was difficult to ever really standardize a server and the software running on it. With the advent of virtual machines, users could suddenly create what was essentially their own version of an OS by saving the image of their virtual computer. This idea of definitively saving and deploying the state of an abstract definition of a computer would come to define the new era of how software is deployed.
While they may be able to provide great standardization, virtual machines are still strictly inferior to their unsimulated counterparts in terms of performance and networking because they still have to be abstractly rendered by their host. In order to run something like an SQL server on a series of virtual machines, you raise the overhead of your infrastructure by the cost of each operating system multiplied by the number of servers you are running. Even running something like a single instance of Kali Linux on a Macbook Pro can cause a significant amount of lag time between commands, making it more preferable to partition the hard drive instead. Furthermore, there is no defined programatic way to issue commands on start to a virtual machine, meaning that dependencies are still manually installed before they can be saved.
As the burden created by operating systems became more noticeable, the incentive to remove their additional overhead increased accordingly. Linux, the iconic leader of open source operating systems, introduced a new way to abstractly separate processes when they introduced Cgroups into their main kernel in 2008. Cgroups were an undiscovered holy grail for managing hardware, allowing resources like cpu, disk space, and networking to be completely isolated while remaining in the same OS. From Cgroups came glorious containers, the completely isolated temporary partitions of a computer that can be created, destroyed, and reclaimed at will.
What Containers Can Do For You
The two most important benefits of Docker and Kubernetes that are currently being realized in many businesses are easily optimized reliability and self-correcting scalability. Deployments, Statefulsets, and Daemonsets are all different ways of deploying containers through Kubernetes, and in combination they can deploy practically any architecture type from stateful applications to monitoring solutions. Terraform and Kops allows these architectures to be defined in code with the cloud hardware they are deployed on, transforming architecture from an abstract graph to real and concrete code.
Continuous integration and deployment are both becoming standard protocol for quality driven engineering teams, and containers greatly ease introducing these to your engineering stack. Because images are created using a Dockerfile, you can define exactly the dependencies expected to be in a testing or production container and how they should be installed onto a development machine. Each time a build is pushed to production, containers make it trivial to run unit tests on the latest software, and once those tests are passed the exact same image can be deployed on production servers. In this way, containers can become part of a gold standard of build quality that is intrinsically made to accomodate automation.
Kubernetes provides both vertical and horizontal pod autoscaling, which when deployed with AWS autoscaling groups can work in concert to automatically adjust both hardware and the software running on it to automatically scale your entire organization in minutes. For businesses that have highly elastic computing demands, this can be a godsend both from a product and overall cost standpoint. Cloud computing and containers are becoming quickly intertwined, and the cost benefits of running in the cloud become more apparent when accounting for how quickly many companies need to respond to their clients needs. Unless your company has the resources to fully maintain and scale data centers across every major hub on the planet, being able to create servers on demand through a cloud provider will always offer a better end product for an international company. Containers allow any organization to meet operational challenges fluidly, and forego that tradition of needing to both buy and enough hardware to meet peak demand. In recent times, bidding wars between major cloud providers are driving down costs, and because Kubernetes is platform agnostic switching from one to another can be surprisingly easy.
The ease of use afforded by containers is staggering, even when working on something like a pet project. Instead of having to instruct users exactly how to install your dependencies and hoping their OS is close enough to mimic your own, you can simply post an image containing your working software to a public repository. This not only future proofs your work by preserving the hardware state of what it runs on, but greatly simplifies installation from what be a large list of dependencies to a single image that can easily downloaded through the single dependency of Docker itself. Another powerful tool called Helm allows entire infrastructure setups to be standardized into “charts”, a set of YAML files generated from a set of central variables that define kubernetes objects. Even if you need to deploy something that may be very complex, like a cluster of NoSQL databases, you can use Helm charts to instantly construct entire interconnected software systems.
I personally love working with containers,and I look forward to seeing the ecosystem grow even further as more and more companies integrate containers into their own engineering stack. With a little familiarity, Kubernetes can even enable more complex tasks such as running routine maintenance tasks or enabling custom self correction methods that allow stability far beyond what would be capable without containers. It’s safe to say containers won’t be going away any time soon, and what more they seem like they might be poised to dictate software development for decades to come.
[1] The future of Linux Containers, dotcloudtv