Packaging up everything I need into a self contained image is incredibly powerful. Using Vagrant and VirtualBox to spin up a Centos distro, configure as I need and then distributing to delegates is really useful. Best of all, at the end of a course, a quick “rm -Rf” cleans up my overflowing SSD.
However, it’s not all rosy with Virtual Machines,
- Slow to start up
- A pain to integrate into the host environment
- They can be huge
- The indirection to the bare metal makes processes running in a VM slower.
This is why I like Docker and containers.
Containerisation allows the running of a process in isolation with all of its dependencies. A program is packaged up into an image with the entire file system it needs (all dependencies). That image can be instantiated as a running container. The container is isolated - processes, network & file-system - but runs on the host, sharing the kernel. So they are more light weight than VMs and provide near native performance - see this paper.
Docker provides a great containerisation solution. Unsurprisingly, the name comes from the shipping industry. In ye olden days, when loading a ship with cargo, a crowd of dockers would load and arrange the items onto the ship. Skilled dockers could safely and maximally arrange the items. When the ship docked at its destination port, another crowd of dockers would unload the ship. In modern shipping we use standard containers to transport goods. Standard container footprints allow for predictable arrangement, easy loading and unloading, mounting to other vehicles, automation etc. So now the entire process can be handled by a few operators.
With software, having a standard mechanism for transporting a program with all dependencies is just as powerful.
- It is easy to spin up a program without having intimate knowledge of all the dependencies and configuration required
- The benefits of a consistent workflow can’t be overstated
- Development, Testing and Production all use the same artefacts - Docker Images
- This leads to fewer surprises and issues at release time
- Onboarding new team members is easier
- Servers can be used more effectively
- Each running container is isolated so it is robust and secure to run multiple processes on a single system
- Live updates are easier
- Scaling services up and down and rolling updates are simpler
- Containers can be spun up pretty much instantly (once downloaded)
- Containers can be cleaned up easily and completely
- Deployed artefacts can be versioned
Docker runs on Windows, macOS and Linux. Containerisation is only supported on Linux and Windows Server so in macOS and other flavours of Windows a single VM running Linux is used to host the Docker Engine with a native client talking to it (don’t worry - the Docker installer takes care of all of this).
It is easy to install (see here) and once installed it is really easy to start using it for running all sorts of programs.
Everything is driven from the command line with new containers spun up using
$ docker run <image>
It’s easy to create our own images but lets make use of some prebuilt images. Going to the Docker Hub we find a load of useful images. For example, this MySQL image. This image is called ‘mysql’ so to run it all we need to do is issue,
$ docker run mysql
Now, in reality, we need to pass some additional flags as this will run in the foreground, be completely isolated so will be of no use outside of the container etc. So the command we actually want is,
$ docker run -p 3306:3306 -d -e MYSQL_ROOT_PASSWORD=my-secret-pw mysql
Let’s break this down.
- ‘-p 3306:3306’ opens up port 3306 and maps it to port 3306 on the host
- Now the MySQL DB is accessible via the host machine
- ‘-d’ runs the container in the background
- ‘-e MYSQL_ROOT_PASSWORD=my-secret-pw’ defines the DB password in an environment variable
- You will usually find some configuration points described in the image home page.
The first time you issue this command it will need to download the image but afterwards it will be cached and multiple instantiations are almost instantaneous. Docker uses clever immutable layering and a copy-on-write mechanism so each instantiated container shares the image file system and only stores the differences. So each container is very lightweight.
We can see our running container by issuing,
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1ffec5bf70a8 mysql "docker-entrypoint..." 9 seconds ago Up 8 seconds 0.0.0.0:3306->3306/tcp kind_poincare
What’s that you say, you actually need a Postgres DB, no problem! A quick search on Docker Hub finds this image.
$ docker run -p 5432:5432 -d -e POSTGRES_PASSWORD=mysecretpassword postgres
Want to spin up a Jenkins server, try out this,
$ docker run -d -p 8080:8080 -p 50000:50000 -v /home/eboyle/myjenkinsdata:/var/jenkins_home jenkins
This adds one more option, ‘-v <host_path>:<container_path>’, which sets up a volume. This maps the local host path ‘/home/eboyle/myjenkinsdata’ to ‘/var/jenkins_home’ in the container. This allows persistence of the jenkins configuration files.
Now picture your own applications and services being as easy to spin up.
This only scratches the surface of Docker but as you can see, once you know how Docker works it’s really easy to spin up programs no matter what their dependencies are. This is not just useful during development but also for testing. If your production environment also uses Docker then suddenly the development, testing and production environments become the same and releases will be a lot smoother.