Tuesday, March 30, 2010

Ban domains in Google results

I love the stars Google added to the search results and how they integrate with your Google Bookmarks. Now all they need to do is add an 'X' next to every result that bans the domain permanently so we can get rid of markmail, osdir, experts-exchange, ...

Friday, March 26, 2010

Python micro-optimization theatre: call dispatch

Python has no switch statement, so if you want to switch control flow based on the value of a variable there are two obvious ways to do it:
if v == 'a':
  ...
elif v == 'b':
  ...
elif v == 'c':
  ...
and
def a():
  ...

def b():
  ...

dispatch = {'a': a,
            'b': b,
            ...
           }

dispatch[v]()
How do these stack up speed-wise? It depends on the access pattern and how many branches you have. If the first branch will usually be taken (v == 'a'), you'll be better off with the first approach. If you usually go down 4 or more branches, dict dispatch is better.
# dict dispatch always takes the same amount of time
In [3]: %timeit f1('g')
1000000 loops, best of 3: 357 ns per loop

###################################

# 7 if statements before we find a match takes longer than
# dict dispatch
In [4]: %timeit f2('g')
1000000 loops, best of 3: 491 ns per loop

# 4 is the same as a dict dispatch
In [7]: %timeit f2('d')
1000000 loops, best of 3: 359 ns per loop

# Taking the first branch is always faster
In [10]: %timeit f2('a')
1000000 loops, best of 3: 215 ns per loop
Code used:
def a():
  pass

def b():
  pass

def c():
  pass

def d():
  pass

def e():
  pass

def f():
  pass

def g():
  pass

dispatch = {'a': a,
            'b': b,
            'c': c,
            'd': d,
            'e': e,
            'f': f,
            'g': g}

def f1(v):
  dispatch[v]()

def f2(v):
  if v == 'a':
    pass
  elif v == 'b':
    pass
  elif v == 'c':
    pass
  elif v == 'd':
    pass
  elif v == 'e':
    pass
  elif v == 'f':
    pass
  elif v == 'g':
    pass

Tuesday, March 23, 2010

ZFS NAS for Home Backup on FreeBSD 8

Update one year later: I've quite happy with this server. In retrospect, I should have gotten a board with more SATA ports or a PCI express x1 slot. Due to only 4 SATA ports I ended up having to run two disks off a PCI SATA card which gets far less than optimal throughput, but performance is still mostly acceptable for my uses.

The CPU speed is a little bit annoying: it is not normally an issue but definitely noticeable when trying to rebuild the world after a FreeBSD upgrade, for example. Still, I think I am willing to put up with occasional slowness for the ultra-low power use.

I'll probably keep running this thing until the hardware breaks since it works well enough.


Disclaimer: I am neither a sysadmin nor a data storage expert, just a guy trying to solve his problems without spending too much $$$. I have less than perfect understanding of these issues! Please leave a note if you find a mistake.

--

Data storage is always an interesting challenge. Most of my text data (source code, writing, etc) is mirrored to every device I own and a few in the cloud. This is my most valuable stuff and I'm pretty confident the cloud and all the devices I own aren't going to go kaput simultaneously, so it's mostly safe.

But I can't mirror my videos and digital photos and such to every device; there are way too many bits!

Remote backup/storage over the Internet is possible, but also expensive and slow: bandwidth and storage costs are high and available bandwidth itself is also pretty limited unless you are blessed with a fiber connection (I'm not). Worse, even though many backup services claim your data will be safe, if you look at the fine print most of them will only accept liability up to whatever you paid for the service in the event of lost data. That means their incentives are not really that great and you need to keep at least one reliable local mirror of everything as well. (Of course there are also annoying security implications for remote backup that are less of an issue if you can keep everything at home.)

So for now I have chosen to forego the remote option and just manage my own storage and backups locally. For three years or so I have been keeping everything in a big ext3 filesystem on software raid 51 with periodic (read: not often enough) backups via rsync to another ext3 on raid 6 (with older, less reliable disks). This has worked fine and I have not even had a dead volume in either of the arrays yet. But in addition to total drive failure there is also bitrot to worry about, and as the amount of data in my arrays grow I feel less confident in this solution. ext3 doesn't do snapshots, either, so if something gets trashed on the main array and backed up to the secondary I'm hosed. (Not to imply that's ext3's fault: there are plenty of userspace tools that could solve this problem, I just don't use any)

Neither does ext3 store checksums for files and so file integrity on the main array cannot be reverified on an automated basis. If everything is working properly there will be no errors, of course, but this assumption is not rock solid for a variety of reasons (crashes, memory corruption, kernel bugs, etc). It would be nice to be able to check all my files every week or so and have some certainty that they are still ok.

Enter ZFS. ZFS presently has many advantages over ext3/4 with software raid2 including deduplication, elimination of the raid-5 write hole, awareness of the RAID-like redundant data at the filesystem layer, end-to-end checksumming, and better error reporting. As many users experiences' show there can often be unexpected problems somewhere in the data pipeline that cause bit errors to show up. The checksums are great here because they enable these problems to be detected and then corrected if there is enough redundant data. That's a killer feature, especially for my cheap commodity hardware which is (probably) prone to memory errors.3

Of course ZFS has disadvantages too. It was designed to run on reliable enterprise machines with lots of spare resources that never corrupt bits, so it sucks memory and there isn't even a fsck tool. There has been some discussion about that last issue, but it doesn't look like Sun's interested in adding a fsck. That means having backups of data stored in ZFS is very important becausee if the filesystem metadata gets screwed up you are basically hosed. Editing metadata by hand? No thanks.

Despite these issues, I think ZFS is a win overall if you keep backups, so I decided to migrate my data.

Unless you want to use ZFS on FUSE your choices for OS are basically FreeBSD or OpenSolaris. OpenSolaris has ZFS v21 with deduplication and some other nice features, but FreeBSD 8 only has ZFS v14. Despite the newer version I am a bit soured on the idea of running an OpenSolaris machine since apparently Sun doesn't even supply security patches. I wouldn't want to use an OpenSolaris machine as anything but a NAS behind my firewall, then, and that seems like a waste of hardware. Not to mention the uncertain prospects of OpenSolaris in light of the Oracle acquisition.

FreeBSD 8 it is, then. I've never used FreeBSD before, but getting it installed was a breeze and it even emails me daily status and security reports. Awesome.

Low power consumption is more important to me than speed so I chose the GIGABYTE GA-D510UD with Intel Atom D510. It has 4 on-board SATA ports and I have 4 disks in my main array.

ZFS scrubbing uses all 4 logical cores available on the Atom which is good because its single-threaded performance is pretty bad. Bad, as in, my five-year-old laptop beats the pants off it. Scrubbing appears to be CPU-bound rather than IO-bound, which is a bummer, but at least it doesn't use hardly any power. I don't have a Kill-a-Watt but with the high-efficiency PSU I'd guess it's under 45W at idle even with 5 disks spinning.

To actually get to the data I exported an NFS share which I can use from my Ubuntu clients. One problem I ran into is that FreeBSD assigns the first user UID and GID 1001 while Ubuntu assigns them 1000 (the UID and GID need to be the same on both sides for NFS to work properly). Rather than try to change files everywhere on the Ubuntu systems I changed both ids to 1000 on the FreeBSD server and all seems well. Hopefully this won't cause any problems down the road.

Performance over the network isn't great; I get about 25MB/s sustained read/write for single files but generally much slower performance when lots of files are involved. I think this might have something to do with NFS's sync semantics, but I'm not very familiar with it. There's plenty more bandwidth on the gigE so I could keep tweaking on it, but I don't really need it to be faster.

So now the only things left to do are to set up weekly scrubbing of the array, set up periodic zfs snapshots, and to convert my backup server over to FreeBSD and ZFS too. I might cover those in a later post if I have time.

I'm hoping that with main array replicated to a secondary via zfs send/receive I will basically not have to worry about metadata corruption (short of a bug in zfs), but I'm not 100% sure if this is the case. I guess the super-paranoid thing to do would be to make tape backups too, but I really don't want to go there.

--

[1] One problem with using RAID 5 on cheap drives is that the chances of having an unrecoverable error on one of the good drives after a whole disk fails are actually quite high. (some comments on this issue) So even if you don't expect 2 disks to fail simultaneously it may not be a bad idea to go with RAID 6.

[2] For various reasons Linux is now playing catch-up when it comes to filesystems. Btrfs promises many of the advantages of ZFS when it's ready for production, but it's not there yet. Since Oracle bought Sun and now is responsible for ZFS and Btrfs, the situation is a bit weird at the moment.

[3] None of that fancy raid stuff does you any good if you are writing garbage to the disk. For almost 2 months I had my array running in a machine with bad ram! Who knows what files might have been corrupted in that time... without checksums it's hard to tell.

ECC memory adds some protection from this kind of thing but it really adds to the cost, too: the cheapest motherboard on Newegg that supports ECC is a year and a half old and costs $124. (Non-ECC boards start at $40) And, to really be safe, all your workstations and laptops that will access the NAS should have ECC memory as well, right? So much for keeping it cheap... Still, if statistics are to be believed, ECC memory is more important than commonly believed.

If the market really cared about ECC I think it would be a relatively cheap addition, but probably most users of PCs (corporate and home alike) are so much more overwhelmingly likely to have crashes and data corruption due to their operating system software or operator error that any failures due to bit flips just get lost in the noise. A shame, 'cos I really wish it wasn't such an expensive feature.

Thursday, March 4, 2010

Simulate Slow Network in Firefox, Chrome, etc

I assumed it would be difficult to set up network delays at the kernel level so I was looking around at browser extensions. Firefox has Tamper Data and Firefox Throttle, but the former only delays HTTP requests and the latter only works on Windows. Not much available for Chrome yet.

But actually, it is very easy to simulate delays at the kernel level:

$ sudo tc qdisc change dev lo root netem delay 400ms
$ ping localhost
PING violet (127.0.0.1) 56(84) bytes of data.
64 bytes from violet (127.0.0.1): icmp_seq=1 ttl=64 time=800 ms

Increasing latency is the easiest way to see results, but netem can also simulate throttling, packet loss, and other kinds of problems. Way cool!