Michigan PC Tech: 2011

Thursday, December 8, 2011

Do-it-yourself RAID tips

When planning a new RAID, you would want some factors straight.

Protection against drive failures.
Keep in mind that RAID is not a substitute for a proper regular backup. RAID can not save you from user errors or natural disasters like flooding, earthquake, tornado, or whatever is common in your area.

Anyway, some RAID types will continue working even if one of the array's drives stops working.
These levels include RAID1, RAID10, RAID 4, RAID5, RAID 6, and exotics like RAID5E, RAID 5 EE, and RAID DP. It is discussed that RAID5 with a large number of member disks is unreliable, using calculations based on vendor-specified URE values, which can be shown to be incorrect.

Capacity.
The RAID array size limited by

the size of a single hard drive,
maximum number of disks the RAID controller would handle,
and the overhead needed for redundancy, if you choose to use a fault-tolerant array.

Should you need simple array size calculations done for you, have a look at this free RAID Calculator.

Performance requirements.
Of all the fault-tolerant RAID types, those not using parity, namely RAID1 and RAID10 are the preferred for random writes. If the RAID is almost always used for reads (for example a static HTTP server), or as a write-only storage (a backup), then RAID levels using parity, namely RAID5, RAID6, and variations thereof are OK.
If fast random writes are needed, use RAID 1+0.

To quickly learn about performance, price, and redundancy for various array types, look at the "RAID triangle".
Keep in mind that RAID doesn't decrease random access time. For good access times, use Solid State Drive.

Tuesday, October 11, 2011

Difference between complex layouts - RAID5 and RAID 50/RAID6 and RAID60

Let's compare these RAID levels:

these RAIDs have the same fault tolerance, however in RAIDn+0 arrays the chance to survive more disks failure is greater (but not 100 %). You can easily estimate and compare probabilities to survive multiple disk failures for different array layouts using free RAID failure calculator.
RAID5 and RAID50 survive one disk failure with 100% probability,
RAID 6 and RAID6+0 can withstand two member disks failure.
When rebuilding, RAIDn+0 shows greater performance as compared to RAIDn which is undoubtedly better.
RAID n+0 overhead increases with the number of RAIDn disk groups contrary to RAIDn where the overhead is fixed and does not increase as the number of disks increases.

Thus, RAIDn is more appropriate variant for small (in terms of number of disks) arrays, while RAIDn+0 is ideal for large data storage.

Tuesday, September 13, 2011

Do you know that RAID 1 is not as reliable as you think?

When using RAID1 it is required to access data on the second disk as well. If you do not, then one or even several bad sectors may develop on the second member disk of the RAID 1.

In this case the RAID 1 does not have fault-tolerance any longer and on top of that you do not know about it. Usually this situation occurs if the RAID 1 is idle most of the time. To make sure there is still fault tolerance you need to read data from two hard disks and compare it. Heavily loaded arrays don't have this problem.

It can be shown that in RAID1 used by Windows data is typically read from the first disk. This quirk renders Windows RAID1 unreliable.

It is interesting that hot spare drives are also subject to the same problem; therefore they are needed to be tested as well.

Wednesday, August 24, 2011

What are the chances to recover a NAS device using Linux OS?

At www.nasrecovery.info you are offered to use Linux as a first thing to recover NAS. In practice, it does not work at all or at least you cannot obtain any meaningful results.

Let's give an example - sometimes happens that you assemble an array by means of md-raid driver in VMware only to see that one member disk has dropped out after restart. Although it is well-known that VMware provides infinitely reliable virtual hardware, so the only option remains is that md-raid is buggy. Thus, there is very little chance to assemble the array with valuable data with the help of Linux. Keep in mind, that it makes sense to try at least several different Linux versions for, as of now, quite a number of Linux distributions are available at the official Linux web site.

By the way, some of you will be lucky, although it is not mentioned on www.nasrecovery.info - some NASes do work on Win Embedded.

Tuesday, August 9, 2011

NAS and backup power

I saw at www.raidtips.com an interesting post "Test your RAID" where it is recommended to pre-test fault tolerance. One of the suggested approach to test fault tolerance is to do a simulated power failure.
However, just turning off the power is not enough to test an UPS. In normal operation NAS receives the notifications from its UPS unit continuously and takes action immediately. However, when starting up the reports concerning power failure are not received. On top of that one should consider that the entire cycle (from start up to shut down) may require up to ten minutes.

Let's consider such a situation:

suddenly power fails,
your NAS unit takes power from the battery for some time and then shuts down,
the grid power comes back and NAS begins to start up,
at this moment power fails again.

Now NAS is not able to process a report on power failure event until it fully starts up. As soon as Network Attached Storage loads, it detects that the power failure has occurred and starts to shut down.
But it may happen that the system will not be able to shut down correctly since UPS doesn't supply power long enough due to battery exhaustion (during the first cycle).
One can avoid this by setting the NAS up so that it finishes its work as soon as it receives a message that power failure occurs. Doing this one can save enough charge of battery for another start-stop cycle.
Also you can configure Network Attached Storage in such a way that NAS does not restart automatically, but manually after the power failure has occurred.

Monday, August 1, 2011

In what cases you need to measure the performance of your hdd.

If you think that the storage device has poor performance, first you need to get data on linear read speed, disk access time, and IOPS (I/O operations per second) for the data storage device in question.

All this information can be collected with the help of any benchmark tool for storage devices. There exist paid and not benchmark software. I used free and easy-to-use benchmark software - BenchMe that shows linear read speed in real time. The free benchmark tool also produces a chart with distribution of access time determined for the storage device. BenchMe displays features the data storage device supports as well.

On the developer's site one can find sample benchmark charts made by BenchMe for various data storage devices - typical hard drives, SSDs, and RAID arrays.
The only disadvantage I come across is that the tool doesn't handle hard disks connected via USB.

Wednesday, July 27, 2011

Unformat vs. Undelete

There are two well-known types of data recovery - unformat and undelete.

The term "undelete" almost always describes a recovery of one file, or a couple of files from an otherwise properly functioning filesystem.
"Unformat" typically describes a recovery of multiple files off the massively damaged volume when none of the data is accessible any longer.
The "undelete" process utilizes the fact that the volume is mostly in good condition, all but one sought-for file.

Unlike the above, during the "unformat", the data recovery software must be able to handle some massive inconsistencies in the partition, since even the quick format would overwrite parts of the filesystem.

It is in most cases useless to use some file data recovery software on files recovered by unformat. The "undelete" is usually very quick and in most cases provides more usable recovered files.

Tuesday, May 3, 2011

How to use hot spare disks properly?

Hotspare is a good option, but the hard drive which is used as hot spare ought to be tested every now and again.
Let's consider such a situation - you decide to build your own RAID array and put N disks (including the hot spare disk) to the RAID array.
In this case there exists 1/N probability that the first disk to fail happens to be a hotspare disk.

Then when one of the RAID array member disks fails you are surprised to learn that the hotspare is unable to replace a failed member disk.
To avoid this you need to check a hot spare disk or stick to RAID6E or RAID 5E/EE array levels.

Monday, April 4, 2011

URE - real or not?

When studying hard drive's documentation provided by vendors you can easily note that often manufacturers provide, to put it mildly, not real URE (Unrecoverable Read Error) values.

This URE specification is commonly applied to support bogus conclusions like "RAID 5 is dead" or to get chances of double read error in RAID5. Problem is, the vendor values are very far off the mark.

Read documentation on Hitachi website, where very interesting URE values are given, e.g. for 3 TB disk - 10^-14 errors per bit read. Let's assume that this value is real. Thus, if you buy the given disk and start reading data off it from the beginning to the end then the probability not to encounter a read error is :

(1- 10^-14)^(8*3*10¹²)~0,79
so the probability that the hard disk wouldn't be able to read one sector is about 20 percent.

All the above means that when you are dealing with a disk completely filled there is a 20% chance that you are not be able to read data off it. This is obviously demonstrated wrong by everyday usage.

Monday, March 14, 2011

Can RAID 1 (mirror) improve average access time?

Although they say RAID does not improve access time, supposedly, RAID 1 is able to improve average access time (but not random access time) during the read.

As you know there are two same copies of data and two member disks in RAID 1.

When read operations are grouped as described below:

the first half of the data is served by the first disk
second disk is responsible for the second part of the data

then the average length which a read head should go to find a sector is reduced in half.
Based on the above considerations it is not possible to improve rotational latency. Anyway, performance improves because less head travel is needed. As for writes, it is not possible to improve performance since both disks should be updated.

Tuesday, March 1, 2011

Why are only the image thumbnails recovered?

More often than not when recovering digital images, typically off a memory card, the image thumbnails are extracted OK, while high-resolution images themselves are not.
There is a phenomenon called "file fragmentation", when the file is placed on the disk in several non-contiguous parts. The sketch showing the process can be seen at the Photo Recovery Limitations page

As a rule memory cards are formatted to FAT or FAT32. If the file gets erased on FAT, only the information how to locate its first fragment remains available. The second and the following fragments cannot be located.
Thumbnail is not big and it is placed near to the start of the file, therefore it can be often recovered successfully, whilst the full-size image is lost. Of all the recovery types, this limitation mostly applies to digital image recovery and unformat on FAT.

Thursday, February 10, 2011

"Do you want to format it?"

It may happen when you disconnect the external hard drive (memory card) not using "safely remove hardware" option, the next time you connect the drive Windows will say "Do you want to format it"?
In most cases such a behavior means that you have a RAW filesystem. Obviously, initial file system was corrupted because those data that was in the buffer during disconnection of the storage device was lost due to such a disconnection.

You can easily solve the problem by just formatting the drive - but... the data would be gone away irreversibly. If the data is important you should retrieve data first and after that perform the format procedure. Data recovery in case of RAW filesystem is similar to unformat but typically more effective. It is very easy to do - just try any data recovery utility.

Thursday, January 6, 2011

Secure Erase at home

The best option to wipe the content from a drive is to destroy it by use of force. Either melt it in fireplace or drill it in many places. Problem is, nobody is going to buy the disk after that.

Software-wise, there are quite a few tools both free and paid that would overwrite the data using either zeros or random sequence for you. Once overwritten, the information is lost forever.
To get the same effect yourself as with the secure erase software, just format the hard drive and then fill it to its full capacity with any insecure data (similar to many copies of Mickey Mouse DVD iso). When the big file fits no more, continue adding some smaller files.

On Full Encryption drives, just changing a password achieves pretty much the same effect as after a secure erase.