As some of you might know, my old server harddrive ran into a series of problems. And no I don’t have a backup.
It all started with that sometimes the server, which is a small ITX board with Atom processor running VMWare ESXi, just stopped responding. So a hard reboot was required. I found out that accessing a few places on the guest servers OS, made this happen (hence why my backups failed).
I disregarded it until it died completely, I couldn’t boot the guest OS.
I should mention that the harddrive is a Seagate 1TB SATA2 drive.
I found out that the harddrive has bad blocks. And when these bad blocks were accessed, the harddrive would just stop responding, a power cycle (unplug hdd power and reinsert) would bring it back again. I tried doing a unix dd, which is a cloning of the harddrive, Windows users might relate to Ghost or ImageCast. The dd command didn’t work of course as the drive would stop responding when accessing the bad blocks.
I mentioned this to some of my associates at my weekly retro evening, which PHK and UJ both said.. hey, you should try recoverdisk from FreeBSD. So I did.
recoverdisk is similar to dd-rescue, basically a resumeable dd command.
3 weeks after starting, and unplugging/replugging power to the drive, restarting the command stopping the script, repeat. I was close to an end.. I got to 100% woohoo !! or not I found out at one time I’ve rebooted the computer and forgot to mount the target disk. There were two things to do, start over or try and grab the data that was separated, it was stored on the local disk anyway. The later seemed like a tedious task, so I went with a do-over. At the time of writing I’m around 87% done. Luckily for me it’s much faster this time as I’m using the drives on a SATA controller, where before it was on a USB to SATA adapter.
Back to the superb tool, which is part of the FreeBSD operating system: recoverdisk.
Recoverdisk is a command line utility and is quite simple to use actually.
What recoverdisk do, is that given the right parameters dumps the disk and keeps a logfile of what it’s done. It takes all the good blocks first, if it hits a bad block, it tries to copy that. If it’s unsuccessful it skips the bad until next pass. Where it adjusts the block size into smaller chunks, and repeats until it finishes, or until stopped.
In my scenario, which might be a rare one. When it hit the bad block, and skipped it, the disk would be rendered unresponsive. I had to stop the command, unplug power and reinsert the power again, restart the command to continue. I had to do that a lot, and by every pass it would want smaller block sizes, making the dumping even slower. This mostly happend because I left it working over night, now I keep an eye on it and stop it when I no longer can keep it under surveilance.
How to use
Part of the man pages
SYNOPSIS recoverdisk [-b bigsize] [-r readlist] [-s interval] [-w writelist] source [destination] DESCRIPTION The recoverdisk utility reads data from the source file until all blocks could be successfully read. If destination was specified all data is being written to that file. It starts reading in multiples of the sector size. Whenever a block fails, it is put to the end of the working queue and will be read again, possibly with a smaller read size. By default it uses block sizes of roughly 1 MB, 32kB, and the native sec- tor size (usually 512 bytes). These figures are adjusted slightly, for devices whose sectorsize is not a power of 2, e.g., audio CDs with a sec- tor size of 2352 bytes. The options are as follows: -b bigsize The size of reads attempted first. The middle pass is roughly the logarithmic average of the bigsize and the sectorsize. -r readlist Read the list of blocks and block sizes to read from the speci- fied file. -s interval How often we should update the writelist file while things go OK. The default is 60 and the unit is "progress messages" so if things go well, this is the same as once per minute. -w writelist Write the list of remaining blocks to read to the specified file if recoverdisk is aborted via SIGINT. The -r and -w options can be specified together. Especially, they can point to the same file, which will be updated on abort.
In it’s most simple form it’s like dd, read disk device and write to an image file mounted to the system (usually my system drives aren’t as big as my data drives).
recoverdisk /dev/da0 /my/mounted/disk/image.dd
This command is like using dd.
For the inital run you should write to a log file
recoverdisk -w /home/tomse/recoverlist /dev/da0 /my/mounted/disk/image.dd
This will write the log list into the specified dir. And for large drives that can take a long time to recover, my suggestion is to break out of it right away, this way we can set it to read the log file when it passes it’s first pass.
recoverdisk -r /home/tomse/recoverlist -w /home/tomse/recoverlist /dev/da0 /my/mounted/disk/image.dd
-r = read, -w = write. quite logical and very easy to use.
this is is.. let the drive finish until it reaches 100% you can break it at any time, and run the command again (with the -r/-w parameters) to resume it’s work.
When using the USB adapter in the first run I got a message they I should change the blocksize to 131072 using the -b parameter, fortunately for me I haven’t been needing to do so yet on the current setup.
recoverdisk -b 131072 -r /home/tomse/recoverlist -w /home/tomse/recoverlist /dev/da0 /my/mounted/disk/image.dd
By default, the block size is 1MB
So a big thank you to Poul-Henning Kamp for implementing this into FreeBSD, to Ulrich Sporlein for doing some fixes and most importantly writing the man pages.
And to my associates at our weekly retro gathering.