I am obsessed with disk space, and with that comes and obsession with redundancy. It is a fact that the hard disk drives you purchase will only work for a variable amount of time. Now this might be such a long time that you will upgrade/replace and never have a drive failure, but if you have ever had an unexpected drive failure and lost information then you know how terrible of an experience it is.
I need about 1.5 TiB to be comfortable with the amount of current bits I have, but I had a couple of WD 500AAKS in a machine alread and figured that I would max out the sata ports on a motherboard and use that machine as a dedicated file silo.
For my purposes I have chosen to run RAID 5 so that I can take maximum advantage of my 5 drives storage space and still have a redundant parity.
Definition of RAID 5:
Striped set with distributed parity or interleave parity. Distributed parity requires all drives but one to be present to operate; drive failure requires replacement, but the array is not destroyed by a single drive failure. Upon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user. The array will have data loss in the event of a second drive failure and is vulnerable until the data that was on the failed drive is rebuilt onto a replacement drive.
*Image, definition and Quote below taken from RAID Wikipedia article
Software RAID vs. Hardware RAID vs. Hardware Fake RAID
When I started down this path with my Abit IP35 Pr0 XE motherboard (ICH9R) my assumption was that I would create the array with the RAID controller on my motherboard and that my operating system (Windows or Linux) would just see the array as a large hard disk.
These raid controllers are known as Hardware Fake RAID or Hybrid RAID:
Hybrid RAID implementations have become very popular with the introduction of inexpensive RAID controllers, implemented using a standard disk controller and BIOS (software) extensions to provide the RAID functionality. The operating system requires specialized RAID device drivers that present the array as a single block based logical disk. Since these controllers actually do all calculations in software, not hardware, they are often called “fakeraids”, and have almost all the disadvantages of both hardware and software RAID.
In windows this would work out just fine most of the time and the array would be seen as a large drive. My goal however was to have this box be dedicated to this array and virtual machines (has 8GB of ram and a Quad core processor) I didn’t want to run windows for all kinds of reasons that I wont begin to list. In Linux I had the option to use DMRaid, which would recognize the Hardware Fake RAID and treat it much in the same way Windows would, but after toying with this for a little bit I became increasingly frustrated and dumped the Hardware Fake RAID idea.
Hardware Fake RAID: Fail.
Go out and buy a several hundred dollar legit RAID card? Hardware RAID:
I could go out and purchase and number of expensive enterprise level options for raid, but this would take me substantially over budget. Also I would have to think about a contingency if my raid controller died, purchasing another controller to recover my data would be absolutely miserable considering I would not be able to financially plan for this hardware malfunction.
Hardware RAID: Fail.
So the only option left is to investigate into how to do software raid in Linux. It would certainly destroy my chances of ever mounting this array with a dual boot to windows, as previously mentioned however this machine is a dedicated Linux box… so no problem!
A huge added benefit to using MDADM is that you could rip these drives out and put them into another machine, then do an assemble of the array on that machine and your data is back online.
The only warning I have before making this decision is that using software RAID is going to offload all of the processing to your central processor. This particular machine is more than adept to handle what I throw at it but you will want to do your own analysis.
Software RAID: Success.
Software RAID with MDADM
First off you will want to grab the multi-disk administration tool:
sudo apt-get install mdadm
This will come with some dependencies, most notably citadel which is strait forward to configure (just say yes). Now you will want to do some prep work on your drives. MDADM works with partitions, so you will want to put equal sized partitions on all of your arrays drives. For my particular application I had 5 x 500WDAAKS drives, so that gave me 5 x 465GiB partitions:
I made all the partitions ext3, however it doesn’t matter because once we assemble the array we will be setting up the file system again. So now lets use MDADM to assemble the array:
sudo mdadm –create /dev/md0 –chunk=64 –level=5 –raid-devices=5 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
Few things to note:
- /dev/md0 == This is the array device (MD stands for Multi-Disk)
- chunk == This is the chunk size, I choose 64 but there is many different opinions on what you should do for each raid type
- level == which raid type you are using, 0, 1, 4, 5, 6 (these are what MDADM supports)
- raid-devices == number of devices that will be in your array
Now mdadm should let you know that /dev/md0 is started, it will now start every time that you boot up. You can get the UUID of the device and set it in your /etc/fstab so that it mounts every time you start. To get the UUID run:
sudo blkid /dev/md0
and then add it to your /etc/fstab:
# /dev/md0 ARRAY
UUID=18a281fb-bd74-47cd-a0cc-70564526d727 /mnt/ARRAY ext3 relatime 0 0
To set up a file system on the raid array you can run a command like this (for ext3):
sudo mkfs.ext3 /dev/md0
Now You have some redundant storage!
If you are like me and reformat your computer often, then mdadm has a great feature where it will just reassemble the array for you:
sudo mdadm –assemble –scan
Also mdadm supports emailing you if there is anything wrong with the array. You have to have a mail server on the machine running the array, if you edit the /etc/mdadm/mdadm.conf and change the MAILADDR line:
# instruct the monitoring daemon where to send mail alerts
Setting up software RAID in Ubuntu Server (TuxTraining)
Fake RAID Howto (Ubuntu Documentation)
Linux Software RAID using MDADM (Ubuntu Forums)
Redundant Array of Independent Disks (Wikipedia)