Dog Food Not Enough for CI Servers

Our Continuum CI server constantly had problems hanging during builds, so we switched to Bamboo, which while not perfect, is better.

This week our Bamboo server ate its own config file when it ran out of disk space and this got me thinking that most CI servers I've seen these days do a horrible job at handling the plethora of near-failure conditions that plague a build.

What's really needed is a maven plugin that acts like a semi-broken build (hold your jokes) so the CI guys can test how well their code handles failures.

This plugin should have at least four different operating modes:

1. Disk hog: Write huge files in the workspace, filling up the disk if possible.
2. Memory hog: Consume huge chunks of memory until it OOMs.
3. CPU hog: Spawn a bunch of threads that grind the CPU to a halt.
4. Hung thread: Get a thread to ignore all signals to it and let it hang.

If the CI server can monitor and recover from these four common semi-broken build issues, then it's quite possibly worth its salt. Otherwise it's really just a pricey/fancy wrapper with what could otherwise be a 3 line shell script piped to mail(1).

4 comments: (+add yours?)

Owen Jacobson said...

It's probably worth doing this with a few different build tools. For example, maven's "out of memory" condition involves a random OOME in the JVM maven runs in; a native build tool would probably end up tripping the kernel OOM killer instead. I'd say at least one java tool, one native tool, and one interpreted tool (rake, scons, maybe make or a shell script).

idcmp said...

Good point. Sounds like this is better served as a spec with a couple implementations to serve as examples..

Mark said...

Thanks for the great ideas. We'll definitely try to incorporate some of that into our testing (1 + 4, probably has more priority)

For the moment there is a plugin that will warn you of low disk space.


3 could be tricky though. Short of running some native code, getting an idea of how much a process is hogging CPU cycles could be a little tough. Well worth looking into nevertheless.

idcmp said...

Hey Mark,

Thanks for the tip! I'll have to poke at it on Monday. Bamboo has been doing us well, so thanks for the work you've been putting into it!

You're right, you'd likely need something semi-OS specific for catching runaways. If you knew the PID of the child process, you may be able to dig around /proc to find some of those details out.