Optimizing the Image: Removing Package Overhead

Package Overhead

Part of the Optimizing The Image Series

Special thanks to @opgoodness for Alpine knowledge and reviewing!

Package Overhead is a potential issue some may run into while optimizing an image for containers. The area of confusion begins with the base image like Ubuntu or Debian. As they are the most similar to VM’s and easy to compile/build on, many take this as an opportunity to simply extend the ubuntu or debian official repositories for their projects. Unfortunately, this does not always lead to an optimized image and I will provide a few points in why you should attempt to compile what is needed for your application and only that, avoiding mistakes I have made in the past.

If you aren’t familiar with Alpine, a minimal security-oriented Linux Distro, I would highly suggest reading about it – mainly because we are going to use it.

NodeJS: Containerize THIS

Lets take a NodeJS application for example. Lets break this down into steps:

  1. we need nodejs, better install it
  2. we need npm to install all of our packages
  3. do we bundle js? might need to run that (webpack)

Okay, three steps. Seems pretty simple right? Lets do this in Ubuntu.


FROM ubuntu RUN apt-get update && \ apt-get install -y nodejs npm WORKDIR /app CMD ["node"]

At first when i built this image I only thought ‘wow, this is easy’ – until I saw the image size.


ubuntu-node latest 878f2533f01a Less than a second ago 463.5 MB

The first thought that came to me: What exactly did I just install? I thought containers were tiny and to the point…which well designed ones are. But, in using the base image of Ubuntu I was not considering the overhead of the base image. Sure, apt-get is awesome and easy – but have you ever considered what all that installs?

Lets walk through this…

The Typical Install


root@d9506c4670ed:/# apt-get install nodejs Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: libicu55 libssl1.0.0 libuv1 The following NEW packages will be installed: libicu55 libssl1.0.0 libuv1 nodejs 0 upgraded, 4 newly installed, 0 to remove and 7 not upgraded. Need to get 11.9 MB of archives. After this operation, 47.7 MB of additional disk space will be used.

Do you ever wonder packages the nodejs relies on? What is in the dependency tree?


root@d9506c4670ed:/# apt-cache depends nodejs nodejs Depends: libc6 Depends: libgcc1 Depends: libicu55 Depends: libssl1.0.0 Depends: libstdc++6 Depends: libuv1 Depends: zlib1g

This doesn’t seem like much, but we also need to remember NPM:


root@d9506c4670ed:/# apt-get install npm Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: binutils build-essential bzip2 ca-certificates cpp cpp-5 dpkg-dev fakeroot file g++ g++-5 gcc gcc-5 gyp ifupdown iproute2 isc-dhcp-client isc-dhcp-common javascript-common libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl libasan2 libatm1 libatomic1 libc-dev-bin libc6 libc6-dev libcc1-0 libcilkrts5 libdns-export162 libdpkg-perl libexpat1 libfakeroot libffi6 libfile-fcntllock-perl libgcc-5-dev libgdbm3 libgmp10 libgomp1 libisc-export160 libisl15 libitm1 libjs-inherits libjs-jquery libjs-node-uuid libjs-underscore liblsan0 libmagic1 libmnl0 libmpc3 libmpfr4 libmpx0 libperl5.22 libpython-stdlib libpython2.7-minimal libpython2.7-stdlib libquadmath0 libsqlite3-0 libssl-dev libssl-doc libstdc++-5-dev libtsan0 libubsan0 libuv1-dev libxtables11 linux-libc-dev make manpages manpages-dev mime-support netbase node-abbrev node-ansi node-ansi-color-table node-archy node-async node-block-stream node-combined-stream node-cookie-jar node-delayed-stream node-forever-agent node-form-data node-fstream node-fstream-ignore node-github-url-from-git node-glob node-graceful-fs node-gyp node-inherits node-ini node-json-stringify-safe node-lockfile node-lru-cache node-mime node-minimatch node-mkdirp node-mute-stream node-node-uuid node-nopt node-normalize-package-data node-npmlog node-once node-osenv node-qs node-read node-read-package-json node-request node-retry node-rimraf node-semver node-sha node-sigmund node-slide node-tar node-tunnel-agent node-underscore node-which nodejs-dev openssl patch perl perl-modules-5.22 python python-minimal python-pkg-resources python2.7 python2.7-minimal rename xz-utils zlib1g-dev Suggested packages: binutils-doc bzip2-doc cpp-doc gcc-5-locales debian-keyring g++-multilib g++-5-multilib gcc-5-doc libstdc++6-5-dbg gcc-multilib autoconf automake libtool flex bison gdb gcc-doc gcc-5-multilib libgcc1-dbg libgomp1-dbg libitm1-dbg libatomic1-dbg libasan2-dbg liblsan0-dbg libtsan0-dbg libubsan0-dbg libcilkrts5-dbg libmpx0-dbg libquadmath0-dbg ppp rdnssd iproute2-doc resolvconf avahi-autoipd isc-dhcp-client-ddns apparmor apache2 | lighttpd | httpd glibc-doc libstdc++-5-doc make-doc man-browser node-hawk node-aws-sign node-oauth-sign node-http-signature debhelper ed diffutils-doc perl-doc libterm-readline-gnu-perl | libterm-readline-perl-perl python-doc python-tk python-setuptools python2.7-doc binfmt-support The following NEW packages will be installed: binutils build-essential bzip2 ca-certificates cpp cpp-5 dpkg-dev fakeroot file g++ g++-5 gcc gcc-5 gyp ifupdown iproute2 isc-dhcp-client isc-dhcp-common javascript-common libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl libasan2 libatm1 libatomic1 libc-dev-bin libc6-dev libcc1-0 libcilkrts5 libdns-export162 libdpkg-perl libexpat1 libfakeroot libffi6 libfile-fcntllock-perl libgcc-5-dev libgdbm3 libgmp10 libgomp1 libisc-export160 libisl15 libitm1 libjs-inherits libjs-jquery libjs-node-uuid libjs-underscore liblsan0 libmagic1 libmnl0 libmpc3 libmpfr4 libmpx0 libperl5.22 libpython-stdlib libpython2.7-minimal libpython2.7-stdlib libquadmath0 libsqlite3-0 libssl-dev libssl-doc libstdc++-5-dev libtsan0 libubsan0 libuv1-dev libxtables11 linux-libc-dev make manpages manpages-dev mime-support netbase node-abbrev node-ansi node-ansi-color-table node-archy node-async node-block-stream node-combined-stream node-cookie-jar node-delayed-stream node-forever-agent node-form-data node-fstream node-fstream-ignore node-github-url-from-git node-glob node-graceful-fs node-gyp node-inherits node-ini node-json-stringify-safe node-lockfile node-lru-cache node-mime node-minimatch node-mkdirp node-mute-stream node-node-uuid node-nopt node-normalize-package-data node-npmlog node-once node-osenv node-qs node-read node-read-package-json node-request node-retry node-rimraf node-semver node-sha node-sigmund node-slide node-tar node-tunnel-agent node-underscore node-which nodejs-dev npm openssl patch perl perl-modules-5.22 python python-minimal python-pkg-resources python2.7 python2.7-minimal rename xz-utils zlib1g-dev The following packages will be upgraded: libc6 1 upgraded, 131 newly installed, 0 to remove and 6 not upgraded. Need to get 61.5 MB of archives. After this operation, 243 MB of additional disk space will be used.

And here are the package dependencies:


root@d9506c4670ed:/# apt-cache depends npm npm Depends: nodejs Depends: node-abbrev Depends: node-ansi Depends: node-ansi-color-table Depends: node-archy Depends: node-block-stream Depends: node-fstream Depends: node-fstream-ignore Depends: node-github-url-from-git Depends: node-glob Depends: node-graceful-fs Depends: node-inherits Depends: node-ini Depends: node-lockfile Depends: node-lru-cache Depends: node-minimatch Depends: node-mkdirp Depends: node-gyp Depends: node-nopt Depends: node-npmlog Depends: node-once Depends: node-osenv Depends: node-read Depends: node-read-package-json Depends: node-request Depends: node-retry Depends: node-rimraf Depends: node-semver Depends: node-sha Depends: node-slide Depends: node-tar Depends: node-underscore Depends: node-which

I am going to be honest here – I don’t know what the hell half of these packages are. Are they even necessary? We are at 243MB of packages – but I was also installing recommends, so we can also run --no-install-recommends to narrow down the install:


root@d9506c4670ed:/# apt-get install --no-install-recommends npm Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: ca-certificates gyp libc-dev-bin libc6 libc6-dev libexpat1 libffi6 libjs-inherits libjs-node-uuid libjs-underscore libpython-stdlib libpython2.7-minimal libpython2.7-stdlib libsqlite3-0 libssl-dev libuv1-dev linux-libc-dev mime-support node-abbrev node-ansi node-ansi-color-table node-archy node-async node-block-stream node-combined-stream node-cookie-jar node-delayed-stream node-forever-agent node-form-data node-fstream node-fstream-ignore node-github-url-from-git node-glob node-graceful-fs node-gyp node-inherits node-ini node-json-stringify-safe node-lockfile node-lru-cache node-mime node-minimatch node-mkdirp node-mute-stream node-node-uuid node-nopt node-normalize-package-data node-npmlog node-once node-osenv node-qs node-read node-read-package-json node-request node-retry node-rimraf node-semver node-sha node-sigmund node-slide node-tar node-tunnel-agent node-underscore node-which nodejs-dev openssl python python-minimal python-pkg-resources python2.7 python2.7-minimal zlib1g-dev Suggested packages: glibc-doc manpages-dev javascript-common node-hawk node-aws-sign node-oauth-sign node-http-signature debhelper python-doc python-tk python-setuptools python2.7-doc binutils binfmt-support Recommended packages: manpages manpages-dev javascript-common libjs-jquery libssl-doc file build-essential The following NEW packages will be installed: ca-certificates gyp libc-dev-bin libc6-dev libexpat1 libffi6 libjs-inherits libjs-node-uuid libjs-underscore libpython-stdlib libpython2.7-minimal libpython2.7-stdlib libsqlite3-0 libssl-dev libuv1-dev linux-libc-dev mime-support node-abbrev node-ansi node-ansi-color-table node-archy node-async node-block-stream node-combined-stream node-cookie-jar node-delayed-stream node-forever-agent node-form-data node-fstream node-fstream-ignore node-github-url-from-git node-glob node-graceful-fs node-gyp node-inherits node-ini node-json-stringify-safe node-lockfile node-lru-cache node-mime node-minimatch node-mkdirp node-mute-stream node-node-uuid node-nopt node-normalize-package-data node-npmlog node-once node-osenv node-qs node-read node-read-package-json node-request node-retry node-rimraf node-semver node-sha node-sigmund node-slide node-tar node-tunnel-agent node-underscore node-which nodejs-dev npm openssl python python-minimal python-pkg-resources python2.7 python2.7-minimal zlib1g-dev The following packages will be upgraded: libc6 1 upgraded, 72 newly installed, 0 to remove and 6 not upgraded. Need to get 14.9 MB of archives. After this operation, 62.2 MB of additional disk space will be used.

Great, we are down to 62.2MB. About a 75% reduction in installation space from 243MB.

So lets try this again:


FROM ubuntu RUN apt-get update && \ apt-get install --no-install-recommends -y nodejs npm WORKDIR /app CMD ["node"]


ubuntu-node previous 878f2533f01a Less than a second ago 463.5 MB ubuntu-node optimized 0c4d0d886afd About a minute ago 292.5 MB

But is this enough? Our image is still 295.5MB at the end of the day (about 64% of its original size). It is still strange to me that a container is this large? When I think of a container I think of a small portable piece of software – not a VM emulation.

The Minimal Install

Rather than looking through the specs to find the minimal install, Docker is nice enough to provide us with a pretty solid implementation of NodeJS through their ‘official repository’: Official Repository for NodeJS.

docker pull node:7

There, we are done right? We can breathe a sigh of relief and say ‘we optimized our container deployment!!’. Well, we are getting there, but there is still quite a bit of bloat going on in the latest image (v7 at the time of this writting). Lets take a look at their Dockerfile:


FROM buildpack-deps:jessie RUN groupadd --gid 1000 node \ && useradd --uid 1000 --gid node --shell /bin/bash --create-home node # gpg keys listed at https://github.com/nodejs/node RUN set -ex \ && for key in \ 9554F04D7259F04124DE6B476D5A82AC7E37093B \ 94AE36675C464D64BAFA68DD7434390BDBE9B9C5 \ 0034A06D9D9B0064CE8ADF6BF1747F4AD2306D93 \ FD3A5288F042B6850C66B31F09FE44734EB7990E \ 71DCFD284A79C3B38668286BC97EC7A07EDE3FC1 \ DD8F2338BAE7501E3DD5AC78C273792F7D83545D \ B9AE9905FFD7803F25714661B63B535A4C206CA9 \ C4F0DFFF4E8C1A8236409D08E73BC641CC11F4C8 \ ; do \ gpg --keyserver ha.pool.sks-keyservers.net --recv-keys "$key"; \ done ENV NPM_CONFIG_LOGLEVEL info ENV NODE_VERSION 7.2.1 RUN curl -SLO "https://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION-linux-x64.tar.xz" \ && curl -SLO "https://nodejs.org/dist/v$NODE_VERSION/SHASUMS256.txt.asc" \ && gpg --batch --decrypt --output SHASUMS256.txt SHASUMS256.txt.asc \ && grep " node-v$NODE_VERSION-linux-x64.tar.xz\$" SHASUMS256.txt | sha256sum -c - \ && tar -xJf "node-v$NODE_VERSION-linux-x64.tar.xz" -C /usr/local --strip-components=1 \ && rm "node-v$NODE_VERSION-linux-x64.tar.xz" SHASUMS256.txt.asc SHASUMS256.txt \ && ln -s /usr/local/bin/node /usr/local/bin/nodejs CMD [ "node" ]

$ docker pull node:7 $ docker images | grep node node 7 36dc1bb7a52b 3 days ago 655.5 MB

Notice how they build NodeJS from a build script? This is a beautiful example of how container images can be built. However, there is one tiny – potentially ‘nitpicky’ – issue. We are still using Debian! To some this might not matter and it truly might not, but after reading and interacting with Docker I can’t help but wonder why anyone would need Debian or regular VM distro as a container image (and its huge!). Remember, we aren’t implementing a server rather deploying an application/process. This means smaller and singular focused.

Enter Alpine

Instead of doing docker pull node, we are going to specify a specific tag in their repo which uses Alpine.



$ docker pull node:7-alpine $ docker images | grep node node 7-alpine a1c188c2c5e1 3 days ago 55.29 MB node 7 36dc1bb7a52b 3 days ago 655.5 MB

55.29MB. Less than a fourth of the ubuntu container, and much smaller than the original node:7 container built on Debian.

Alpine Drawbacks
  • No BASH by default
  • Different Compiler than main stream Linux (doesnt use glibc)

Final Thoughts

At the end of the day to many people performance is what matters for most. Obviously you need to test your application to ensure it performs as expected on whichever Base Image you decide to build your application ontop of – but don’t forget the lesson: Do you really need that massive base image? The overhead and potential security risks from additional packages may not be worth it.

Never shy away from asking the question Is this container built for my specific use-case and application? Originally I didn’t bother asking these questions. I used phusion/baseimage and rolled with it. In the early days it was about moving to Docker, not necessarily about moving to optimized image installs. Afterall, “we were moving from FreeBSD Jails” I always said. I know now that was an oversight. Why? Because containers are better as application based, small and single process oriented – remove the cruft, secure the image.

Also, please note that removing packages and measuring an image based on disk space is not the only measurement one can make for an optimized image. Next we will take a look at measuring resource utilization of particular packages and how we can potentially optimize from the perspective of CPU/MEM utilization.

Managing WordPress from CLI

As a System Administrator, I prefer things to be super simple and preferably from a command line (CLI).  After working with WordPress enough, I became extremely dissatisfied with the update process and the fact I would have to go somewhere and curl/wget a file down, put it in place, manually remove the old one, and go through the tedious task of updating.  If this was Docker, I would simply re-build an image…but even then, I have to download and update plugins – honestly, its nasty and a waste of time.  This is where WP-CLI comes into play.  WP-CLI is a command line tool (packaged with .phar) that I would recommend, and argue is essential to administrating WordPress.  Here is an example of the code I ran to completely update my WordPress from some ancient version:

$ wp --path=/path/to/jbkc85.com core update
Updating to version 4.4 (en_US)...
Downloading update from https://downloads.wordpress.org/release/wordpress-4.4-new-bundled.zip...
Unpacking the update...
Success: WordPress updated successfully.
$ wp --path=/path/to/jbkc85.com plugin update --all
...
Disabling Maintenance mode...
Success: Translations updates are not needed for the 'English (US)' locale.
Success: Updated 3/3 plugins.
$ wp --path=/path/to/jbkc85.com theme update --all
Enabling Maintenance mode...
...
Disabling Maintenance mode...
Success: Translations updates are not needed for the 'English (US)' locale.
Success: Updated 7/7 themes.

Notice how I had three commands, and it updated multiple areas of my site?  This is what System Administration is all about…find tools to make the job simple so you can work on other important matters.  With this tool not only am I able to keep my sites up-to-date, but I am also able to make a rather interesting Docker distribution for WordPress and building images with brand-new and up-to-date plugins/themes each build (more on this later).  I highly encourage ANYONE running a WordPress site to check this out and start using it!

Resource: http://wp-cli.org/