Wednesday, November 3, 2010

Bit Rot, or Why Hard Drives Suck

The scary thing about hard disks is that failure can be silent. The whole device doesn't have to go bad for 10 or 50 or 1000 sectors to suddenly become unreadable. And unless you are actively checking the drive periodically with a SMART surface scan or a zfs scrub, there could bit-rot you don't know about.

Here is an example of what bit rot looks like, from my home storage NAS:

[root@bsdnas /home/bthomson]# smartctl -l selftest /dev/ada0

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     22587         624048385

"Sorry, sir, I'd like to read your data at 624048385 but I just can't!"

Some further inspection reveals this disk has 26 sectors it is concerned about and 17 which are fully hosed and cannot be read:

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       26
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       17

This drive is only 2 years old. If it was out of warranty I would replace it or rewrite it with zeros so it would reallocate the Pending_Sectors and hope for the best, but since a warranty exchange is only $6 and these sectors have gone bad recently it's probably not a bad idea to exchange it.

Of course the drive you get on exchange is refurbished and could be in even worse shape than the one you exchanged, so it's somewhat of a crapshoot.

Be the first to reply!

Post a Comment

By submitting a comment you assert that it is your own original work and agree to grant a non-exclusive licence to Brandon Thomson to display it on log.bthomson.com.