I've recently been toying with a 4 drive gmirror (RAID1) array for a file system that does mostly random reads. One drive is currently synchronising, and the array itself is idle, so I thought I'd do a quick experiment and see which load balancing algorithm works best. Each algorithm is sampled using iostat with a period of 60 seconds. The results are surprising.
"split" algorithm, 4096 bytes. (This is the default if you do not explicitly configure an algorithm.)
extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ad12 344.5 0.0 14467.1 0.0 0 3.3 73 ad14 344.4 0.0 14810.8 0.0 1 1.1 30 ad16 344.5 0.0 14811.5 0.0 2 1.7 44 ad18 0.0 347.9 0.0 44091.8 0 1.5 41
Note the high IOPS from each drive (presumably because of the 4096 byte split level), and that the destination drive is only 41% busy.
"round-robin" algorithm.extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ad12 115.6 0.0 14798.9 0.0 0 10.0 85 ad14 115.6 0.0 14796.8 0.0 0 1.3 16 ad16 115.6 0.0 14794.7 0.0 1 2.3 25 ad18 0.0 350.2 0.0 44390.0 1 1.2 35
Similar overall performance to "split", except that the reads are not split into such small amounts. Why do the busy levels of the (identical) source drives vary so remarkably?
"load" algorithm.extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ad12 12.0 0.0 1535.5 0.0 0 4.5 3 ad14 331.4 0.0 42418.3 0.0 0 0.9 27 ad16 133.8 0.0 17120.2 0.0 0 1.2 13 ad18 0.0 481.9 0.0 61076.4 2 3.0 92
Wow. We've just seen nearly 50% improvement in sequential read speed. The destination drive is also pretty busy, which is good. This is the best performing algorithm in this simple test.
"prefer" algorithm.extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ad12 0.0 0.0 0.0 0.0 0 0.0 0 ad14 0.0 0.0 0.0 0.0 0 0.0 0 ad16 472.5 0.0 60483.8 0.0 1 0.9 37 ad18 0.0 477.2 0.0 60486.2 1 3.3 96
Reading from a single drive is faster than "split" and "round-robin", and almost as fast as "load."
Are these results skewed by the unusual configuration of a 4 drive mirror and the fact that it's a rebuild rather than normal operation? Hmmm...
UPDATE: Array rebuild is complete. Doing a random read test (executing dd if=/dev/mirror/db0 of=/dev/null iseek=<random_number> bs=16k count=1 repeatedly) has produced even more confusing results. All algorithms are showing very similar numbers, with an effective random read rate of between 133.8 and 140.3 16k blocks per second. The highest value of 140.3/sec is actually from the prefer algorithm, reading from a single drive only! There appears to be zero benefit, for this synthetic test anyway, to having more than one drive to read the data from.
System: FreeBSD 7.1-RELEASE amd64, Gigabyte GA-EX38-DS4 mainboard, 4 x 2GB A-DATA DDR2-800 RAM, 4 x WD7500AAKS 750GB drives connected to onboard SATA ports in AHCI+native mode, gmirror configured to use 4 drives.
UPDATE #2: FreeBSD 8.0 has a much improved "load" algorithm which does correctly balance. I ran a 4 x 1TB drive RAID1 array for a few months before changing to RAID10 (which also benefits from the new algorithm.)
|