ARCHIVES
August 2011
July 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
August 2008
July 2008
June 2008
May 2008
March 2008
February 2008
January 2008
November 2007
CONTACT
|
About this blog: Computers hate me. They really do. Every time I try to do something unusual like add new hardware, something is guaranteed to go wrong. I decided to start writing about my constant problems so that someone else might benefit from my experiences - or at least laugh at them! |
Some users may not be aware that FreeBSD will unconditionally panic if a drive that is mounted goes away for some reason. This includes unplugging removable devices such as a USB stick. The entire OS instantly ceases to operate, like a Windows blue screen.
I wandered into the office and noticed a screen full of WRITE_DMA TIMEOUT error messages about a particular drive. Checking of logs on other machines showed the server had been non responsive for at least 4 hours. Ctrl-Alt-Del didn't do anything (it normally forces a reasonably graceful shutdown) so I tried one last thing before hitting reset: unplugging the drive. I figured because it was part of a mirror that FreeBSD would detect it as failed, disconnect it from the mirror, and happily continue with what it was doing.
Unfortunately, I was wrong! It panicked and locked up with a very loud continuous beep. Fun at 5:30am when your family is asleep...
I suspect it's due to dodgy SATA cables: I have found the red "COMAX" type supplied with Asus motherboards to be quite unreliable, often throwing up CRC errors under FreeBSD (or in the case of Windows + Intel Storage Matrix, disconnecting the drive from the array, or even rebooting). I haven't had a single problem with the yellow locking type supplied with Gigabyte motherboards, but I'd run out of them when I installed this drive a couple of days ago. :(
The question still remains why issues on a single SATA port have caused the entire server to grind to a halt, and why gmirror hadn't just disconnected the drive from the array after 4 hours worth of DMA timeouts. It's not the first time I've seen a single faulty drive in a mirrored array end up killing the whole server...
UPDATE: I swapped out a yellow/Gigabyte cable from a box with a disk caddy that wasn't in use, but accidentally unplugged the wrong motherboard port... one which was connected to an active HD rather than an empty caddy receptacle. Oops. However, FreeBSD reported "GEOM_MIRROR: Device gm0: provider ad8 disconnected" and continued... as you would expect. I wonder why the same thing didn't work on the other server?
|
|
Related posts:
Quick tip: Disabling power button under FreeBSD
FreeBSD sees through Seagate
More woes with Intel Matrix and/or hard drives
FreeBSD gmirror: comparing the different load balancing algorithms
Seagate: Reallocated sector count on "new" drive = 99
A complete list of all recent hard drive failures in the office
Problem with top cover seal on hard drive
FreeBSD hack to disable AHCI on Jmicron JMB363 controller
Beware of HDDerase reducing your hard drive capacity
4th WD drive failure
|