Wednesday, July 12, 2006

ZFS saved my backside

Well, not totally, but it did save me a re-install!!!

Last night I decided to stay up and watch the webcast of product lunch of Sun's new servers (Hmmm, I want a Thumper...). In Thailand the festivities did not start till 12:30 at night. The laptop I am using is an Acer Ferrari 4005. Recently, Sun released a sound driver for it (I was using OSS before), but whenever I used it there was sound + static.

Now I had 1 hour before the webcast, so I decided to do the usual rounds of the Open Solaris site to fill in time. I found that the latest ON build had a patch for the sound driver. Hmmm, there is not enough time for a compile (even on a Ferrari), so I downloaded the bfu archive, and started the install. If you read my earlier articles, you will probably know, that I am running on a ZFS root. Since this is a loosely undocumented feature, it does complicate a 'bfu'. I have done a bfu update on a ZFS root once before, so I should not have a problem, "Right?". The download took almost 1 hour (20Kb on a 4Mb link, grrr), so I quickly put the original bootadm back before I 'bfu'd'. Atfer the 'bfu', the usual 'acr', and then I replaced bootadm with the zfs modified bootadm and updated the archive. Just in time for a reboot right on the bell.

As I was rebooting, I was thinking, "I should have done a zfs snapshot before I started". It was late, and I was fully aware that a failure would only affect me. Ok, reboot. . . . Ahhhhh!!! Almost as I rebooted, the ferrari reset and booted again. Grrrr. I started to get 'Rhymes with MISSED'. I quickly edited grub to add the "-kd" option to the kernel line, and booted. Great :(, the error message was that it could not find the root partition. Ok, failsafe it is then.....

In failsafe, everything looked ok. I could mount the ZFS root. /etc/zfs/zpool.cache existed, and looked ok (Jibberish. It is binary after all). I then decided to look at grub, and found that the file /boot/solaris/filelist.ramdisk did not contain zpool.cache. Ah we now have somebody else to blame!!! The 'bfu/acr' procedure updates this file without considering that I may have added to it.

Right, what to do now. I tried to update the boot archive from both failsafe, and from a spare UFS root, and kept on getting a "filesystem full" error on the ramdisk. Now I have 1.5GB of memory and plenty of swap space, and I had modified the create_ramdisk script to double the amount of memory allocated. I still got the same message. "Marvelleous"

This all prompted a late night re-think. "I wish I had done a snapshot before I started the bfu....". Ok it was looking like a re-install of the root partition was on the cards. This was something for after the morning coffee... Hang on, after I installed and created a ZFS root, I did a snapshot to create a clone. That snapshot was still there! "zfs rollback intdisk/snv42_root@initial". Reboot. Hey, we are in business. I still have some driver/app installing and JDS update to do, but no Solaris install. Fantastic. So I quickly brought up firefox, and connected to the webcast, just to hear the last sentence. Oh, well I will fix the rest up in the morning. Now if I had only done a snapshot again before the 'bfu'!!!!

ZFS Rocks!!!
P.S. The time to do a ZFS snapshot is less time then thinking about it.

No comments: