Our high performance compute cluster (HPCC) has fairly primitive tools for managing the deployment of the operating system on the compute nodes. Our current tools are “aspencopy,” which takes an “image” of a the filesystem of a running server and saves it as a .tar.gz file (NOT a disk image). “aspenrestore” is its counterpart, which deploys an “image” to another server. The utility is smart enough to update things like the host name, IP address, host SSH keys, etc. However, the images are essentially “black boxes,” in the sense that there is no system for keeping track of which configuration changes have been applied to which image, and no way to know which image is running on each server. The next cluster that I am responsible for purchasing must include a configuration management/data center automation system, such as:
On a related note, Vagrant is a system for managing virtual machines. You can define a virtual machine configuration in a specification file, and Vagrant will automate the startup and shutdown of arbitrary numbers of virtual machines.