I have a friend that I talk about computer parts and prices with non stop, he spends a lot of time down in the trenches with enthusiasts and keeps his finger close to prices on almost all PC components. Recently he let me know that 500GiB Western Digital hard drives (WDC500AAKS) were going used for about $50. My original array was built out of 6 of these drives, and brand new 1TiB drives are hovering around $100. So a couple of fedex boxes later I am the new proud owner of some 1TiB drives.
I have heard horror stories about these drives regarding RAID arrays, with them dropping out for no reason and causing the array to degrade and rebuild. Looking a little further, in particular the newegg reviews, I found that by enabling TLER (Time-Limited Error Recovery) all of the issues with RAID on these drives are solved. In fact one of the only major differences between the Caviar line of HDDs from Western Digital and the RE line is this TLER feature. Lucky for the poor college kids in the computing crowd Western Digital makes a tool named WDTLER that will enable or disable TLER on Western Digital HDDs.
TLER exists because there are sometimes conflicts in whether the error handling should be undertaken by the HDD or the RAID controller, if TLER is not enabled the drive can be marked as unusable and cause significant performance degradation. With TLER enabled the HDD will wait for the RAID controller to resolve errors up to the amount of time that has been set. In my case Ihave set the drives to their recommended RAID times of 7 seconds.
WDTLER
WDTLER is a great tool, however it is written for DOS which is… quite terrible. I desperately tried to make a bootable DOS disk from a CD-R, however I could not find a version of DOS that would mount the CD-R after it booted up. After about 4 hours of struggle trying different version I gave up and moved in a very sad direction… 3.5″ floppy drive. After much scrounging and borrowing a FDD cable from my good friend I made a DOS bootable disk on windows and unzipped the WDTLER files onto the disk (WDTLER.zip is mirrored below). Booting into the disk you can run “TLER-ON” or “TLER-OFF” to enable or disable TLER with a 7 second delay. Alternative you can use this command to set the delays yourself:
WDTLER -R<seconds> -W<seconds>
Notice that this utility will change the TLER on any HDD that is connected to the system. After running “TLER-ON” I was greeted with this output (and a fatal error inside WDTLER, but it works so don’t worry):
WDTLER Version 1.03
Copyright (C) 2004-2006 Western Digital Corporation
Western Digital Time Limit Error Recovery UtilityModel: WDC WD1001FALS-00J7B0 Serial Number: WD-WMATV1234567
Read TLER time is 7.000 seconds.
Write TLER time is 7.000 seconds.Model: WDC WD1001FALS-00J7B0 Serial Number: WD-WMATV1234567
Read TLER time is 7.000 seconds.
Write TLER time is 7.000 seconds.Model: WDC WD1001FALS-00J7B0 Serial Number: WD-WMATV1234567
Read TLER time is 7.000 seconds.
Write TLER time is 7.000 seconds
3TiB of drives have now been prepped to be initialized into RAID. I used the notes from my other post on using MDADM for software RAID to set the array up. Pretty soon I will be adding some more drives to this setup so check back for how to use MDADM to grow your array onto a new set of drives.
Extra
I have had an issue in the new versions of ubuntu 9.04 and possibly 9.10 where the array will initialize properly, then on restart will be completely fubar’ed. This is likely due to the mdadm.conf file, so to make sure it is update properly please run this command:
sudo mdadm –examine –scan –config=mdadm.conf > /etc/mdadm/mdadm.conf
Resources
Time-Limited Error Recovery (Wikipedia)
Western Digital Time-Limited Error Recovery (TLER) with RAID and SE, SE16, GP Models (HardForum)
WDTLER Utility (Init Blog)
Have you created a Linux Raid 1 Software without a Raid Controller on a Linux Machine with Western Digital RE3? Are there problems to use these hard drives with a Software Raid with the TLER?
The RE devices actually have TLER enabled already. You wont have to do anything special.
I have not used an RE3 drives, however I have run raid 1 using mdadm and it is quite easy to set up.
So which is the best pratices for using mdadm to create a Raid 1 with an WD RE drives? TLER will be enable (7 seconds) or disable if i would like to create a software raid with mdadm?
Best Regards
Andrea
TLER is a setting on the actual hard disk, your software raid using mdadm does not modify any properties of the hard disk.
With the RE drives TLER is already enabled and set to 7 seconds. You will need to create partitions on both of the drives (lets call them /dev/sdb1 and /dev/sdc1) then you can run the mdadm command:
sudo mdadm –create /dev/md0 –chunk=64 –level=1 –raid-devices=2 /dev/sdb1 /dev/sdc1
I am not positive that the chunk size is appropriate, you probably need to dig around for that.
[...] http://blog.agdunn.net/?p=208 Here is a link to my friend's blog where he shows how he did this in Linux. __________________ [...]
Just wondering if you had played around with /sys/block/sd?/device/timeout as a mechanism to delay the timeouts that the raid is experiencing for the device – i.e. is an alternative to setting the TLER low setting the disk timeout higher. I’m just not sure if the about setting will affect this.
I’m experiencing a disk failure of this king every 4 weeks or so and it takes about 24 hrs to resync the array – I’d be more than happy for the devices to wait a couple of minutes once a month if it meant I didn’t have to resync the array again.
I have never played with the timeout settings. I also never experienced drive failures like you describe.
The drives are not truely failing right? Just dropping from the array?
Hi,
I’m wondering how to apply this TLER patch to drives already in a RAID configuration using ICH10R. Would I have to basically destroy and rebuild the RAID to do this? (Urk…!)
It isnt a patch that will effect you at the OS Level. It is a setting in the firmware to my knowledge. This will make the devices wait for the controller to resolve an issue instead of trying to resolve it themselves.
When the drives try to resolve an i/o issue, frequently the controller will mark them as unresponsive/failed. TLER is a setting to make the drives wait for controller resolution.
[...] other things to be able to use RAID properly. Check out my friend's blog where he talks about it. http://blog.agdunn.net/?p=208 Make sure to look at the Wikipedia TLER link…… [...]
Warning: the newer versions of the WD1001FALS have TLER permanently locked to the disabled state.
If the extended device timeout doesn’t solve the issue, this makes these devices completely useless for RAID. Also, you would get one of the TLER-less devices if one of your older ones with TLER support ever need a replacement under warranty…
So, the real golden question is: does increasing the device timeout fix these drivers absurd command timeout delays?
It is unacceotable that WD has decides to remove TDLER as a feature on these drives–RAID is raoidly becoming common on systems due to the low drive prices, and thus it would make it far more useful if manufacturers provided firmware support for TDLER or otherwise made their drives configurable for use in RAID.
It is becoming common for people to start with a single drive with the intention of later converting to RAID–this is now made more difficult scince it will require builders to determine if this is going to happen before building each system–and removes the ability for the owner to change their mind later and change to RAID.
Worse, by removing an existing feature from a current model, WD has opened a rats nest of trouble for technicians who want to go to RAID, as they must now check the existing drive.
Given that WD warranties the drives for 5 years (and many will exceed that lifetime, WD has created a problem needlessly which will be with us for at least a decade.
Perhaps if enough people complain they will reverse this annoying decision?