One of the fun things I was charged with at IC was deploying a Linux-based platform for our Open Source applications. We used CentOS because of its close (read: pretty much identical) resemblance to a certain “prominent North American Enterprise Linux vendor“’s distribution. I had used RHEL pretty extensively previously, so it was a comfortable fit for me to take over where my predecessors had left off. I immediately upgraded the systems, brought them current with patches and fixes, and also took care of the DST changes. After that, we had to inaugurate a new hardware platform which included some new (to us) stuff such as Infiniband, Mutipathing (Fibre Channel), and out-of-band management (IPMI, BMI, etc). This was not an easy thing, and involved a great deal of trial and error, and fortunately I had someplace to run to when the going got a little too tough: VMWare.
It’s a pretty clear choice to make these days: virtualization is in, clunky hardware is out. There are some pretty obvious wins here, such as reduced hardware infrastructure, reduced networking hardware needs, fewer points of failure hardware-wise, lower power consumption (mean wattage per VM is far below what it would be on standalone hardware), and rapid deployment. There are some unknowns, mostly regarding the overall security of the VMs, but we’ll skip that for now and pretend it’s not there. Yes, I’m in 3rd grade.
The beauty for me was the ability to create appliances. Think of these as virtual TV dinners. We unwrap them, do a minor amount of preparation, cook (bring online), and serve. When we parted ways, I had four appliances 90% completed. I was only struggling with how to bring them onto a cluster without duplicating node numbers and other node-specifics. I had got as far as getting a custom firstboot script up and running to prompt for some basic parameters before coming online and going live with the rest of the network. A few of the problems I was having involved the locating and joining of clusters that were not on the local network. Part of what I was accomplishing involved disaster recovery, and I didn’t want to create an even larger disaster if I could avoid it. Other issues were much smaller and only involved some basic networking (multicasting) issues, and some weirdness with the VMWare tools under CentOS 4.3. And don’t get me started with timekeeping.
Despite all the work, it turns out that it’s more than worth it in the end. When I need a new machine, I just clone the template VM and bring it online. I can keep dedicated service appliances (such as Tomcat or MySQL) or have a first-boot config menu that prompts for an appliance type. The trick is creating the template VM!
Accomplishing it isn’t really that tough, as I’m sure you could imagine. There are several ways to do it, but for these purposes I’ll stick with the two that I used personally. The first method I used was to just do a bare-bones install using the normal CentOS installation procedure, then adding my packages later. The packages I was going to add were stored in a local ‘yum’ repository, so all I really had to do was copy over a master repo.conf and issue a ‘yum install appliance_vm’. I’d then boot the VM, do some basic configuration, and set the startup services. After I knew everything was running, I’d migrate over the necessary config files, do a sanity check reboot, and then test it. Once I knew I had a working appliance, I’d reset the configurations as to avoid conflicts, and then take a snapshot. That worked pretty well, but really wasn’t ideal.
What really made me happy was when I created my own installation media. This way, there was a choice: if the template was messed up, lost, or inaccessible; the install media could be used to accomplish the same thing. This plan was really great for me because it not only allowed for the flexibility of templates, but the disaster recovery capabilities of physical media. The installer could boot from anything and wasn’t tied to any particular hypervisor. In testing, I booted and installed this image using VMWare Workstation, Parallels Desktop, and Xen*. This totally rocked. Because I could leave the hardware detection in the loop, I was able to provide a crazy amount of flexibility for very low cost. If the worst happened, I’d still be able to bring up servers even from desktops and workstations. Because of the MPLS network we had in place, this also would have allowed for massive DR distribution in the case of natural disaster or pandemic.
I never thought I’d be hoping for some crazy strain of tern flu or something.
Combined with OCFS**, ZFS, and Linux-HA over multicast, this more than gives Microsoft a run for its money. Theoretically I’d have been able to distribute CDs out to everyone in the company with simple instructions and have a number silkscreened onto the disk corresponding to the node ID that would have to be entered at install time. Then maybe have a line at the end of the instructions that says “this message will self-destruct in five… four…”. The point here is that it would have been very cool had I been given the chance to implement it. The first boot brings up a menu prompting the user for the unique number printed on the media, and what type of server it’s supposed to be. Then the corresponding service(s) have their configs activated, enter a running state, a sanity check is performed to verify network connectivity, and the user receives confirmation that they’re up and running. Or they don’t.
If anyone asks, I’ll even go ahead and get über nerdy and post some detailed instructions on how to do it. But you have to ask. It’s pretty dry.
So there we go. A really terse description of why I *heart* appliances.
* Xen is by far the most technologically superior hypervisor going. I stronly suggest evaluating it. Even though it doesn’t have the crazy-cool-gee-whiz features of the VMWare VI, it’s still pretty amazing and sick fast.
**A note to the OCFS devs: fix your docs! They suck!


