RAID: Enjoy the power of arrays of... drive letters!

By Jonas Buys

OS/2 and hardware... For more than a decade, OS/2 has been misjudged to have very limited hardware support. But in fact, if you big some further, you'll notice that OS/2 Warp and eComStation offer extreme large and powerful hardware support.

RAID: for a lot of OS/2 and eCS users this term sound extremely mysterious and far away in the ears. However, this technology can really give your PC or server a huge performance boost, and it can protect you from loosing data.

This article will introduce this technology and discuss all options thoroughly, and list a lot compatible devices and chipsets. This article is intended for users that are well familiar with OS/2 FDISK and OS/2 Logical Volume Manager, and for experienced users.

1.1. Introduction
RAID (Redundant Array of Inexpensive (or independent) Disks) is an architectural concept developed to turn relatively slow and inexpensive hard disk drives into fast, large-capacity, and more reliable storage systems. The RAID concept was introduced by a team of researchers at the University of California Berkeley in 1987 by Patterson, Gibson and Katz. RAID systems derive their speed from striping data across multiple disks, placing successive pieces of a file on different disks, thus allowing parallel data access. Reliability is generally achieved through replication (called mirroring) or by using error detection and correction schemes across the disk array. (Mirroring is, in fact a technique that predates RAID, being used, for example, in the IBM AS/400 minicomputer.)

There are many levels of RAID, which differ in the way they provide for speed and/or reliability. The original Berkeley work has specified RAID levels 0 through 5. Slight modifications to these levels have recently resulted in the specifications of levels 6 and 7, and even other newer RAID levels has been put as proprietary standards.

1.2. Raid Levels
Before we can start discussing RAID implementations on the OS/2 Warp platform family, we need to know the difference between a logical drive and a physical drive.

A physical drive is something you could be able to touch or it is a view of how a real hard disk is split into pieces (called partitions); it can be your hard disk, but mostly it's a partition. A logical drive can be the same as a single hard disk or partition, but it doesn't need to be. It can be a set of a selected number of partitions (disk spanning) that are located on one or several hard disk drives (it is simply an array of independent physical drives). With RAID, the concept of logical volume comes to mind. The concept of a logical volume is very similar to a logical drive. A logical volume is composed of one or several logical drive, the member logical drives can be the same RAID level or different RAID levels. The logical volume can be divided into a maximum of 8 partitions. During operation, the host sees a non-partitioned logical volume or a partition of a partitioned logical volume as one single physical drive. Thus we "merge" a set of partitions or hard disks to end up with one large super-volume.

Note that IBM's Logical Volume Manager, first introduced in OS/2 Warp Server for e-Business, allows us to create, use and benefit from disk spanning and logical drives. Earlier we were stick with partitions we created with OS/2 FDISK. As an illustration, see the image at the left. As you can see, Drive1 contains three partitions (labeled C:, D: and E:). Drive 2 also contains three partitions (labeled E:, X: and another instance of E:). Thus, physically seen, these two disks have six partitions, and each E: instance would have got at different drive letter. Suppose C: is the only primary partition in the system: Then you would have the following drive letters in your Drives folder: C:, D:, E: for E: on Drive1 in the image, F: for the first instance of E: on Drive2, G: for X: and H: for the second instance of E: on Drive2. This is the way the operating system would see it using FDISK. Notice there are three occurrences of E: in the figure. When we would be using OS/2 with LVM, then we will see the following drives in the Drives folder: C:, D:, E: and X:. E: is a logical volume that we created using LVM and that consists out of three partitions. However, the fact that the volume consists out of partitions is hidden from the user.

No RAID
There are some terms invented to indicate use of non-RAID systems.

NRAID
NRAID stands for Non-RAID. The capacity of all the drives is combined to become one logical drive (no block striping). In other words, the capacity of the logical drive is the total capacity of the physical drives. NRAID does not provide data redundancy. You could see LVM's disk spanning as NRAID, but it can provide RAID too, of course. Click [here] to get an image representing NRAID.

JBOD
JBOD stands for Just a Bunch of Drives. The RAID controller treats each drive as a stand-alone disk, therefore each drive is an independent logical drive. JBOD does not provide data redundancy. Click [here] to get an image representing NRAID.

RAID-0: Striping (no fault tolerance)
RAID-0 only stripes (splits) the data across the array disks, which achieves speed through parallelism. However, access time remains the same. Data is written alternating to all disks. RAID-0, however, does NOT improve reliability. As indicated in the image at the right, each data object us striped across all disks in the array. Danger of data loss is greater than a single disk as with a loss of a single disk the whole subsystem is worthless. Thus, should one disk become unusable, then all data is lost, and a specific file cannot be retrieved from the drives left, since some binary parts of the files are lost. Should never be used in mission critical environments. Recommended Applications are video/image production and editing, pre-press applications and other applications requiring high bandwidth.

1.2.3.1. RAID-1: Mirroring
Unlike RAID-0, RAID-1 focuses on reliability through redundancy, and it does not offer speed. In this configuration two (or more) sets of disks are used, primary and secondary, where the secondary disks maintain an identical image of the primary disk data, which is striped across multiple disks in RAID-0. Thus, if a disk in one set fails, one in the other set can replace it, since all disks contain exactly the same data. This is clearly an expensive arrangement, as disk usage drops to 50% or more. Furthermore, while reads can be faster since they can be done using both disks simultaneously (and each from each disk reading different blocks), writes are slow, as there is always the need to write two copies. One problem occurs when both disks report different data (write error): which disk is correct? Some vendors name disk mirroring as disk duplexing of two or more controllers. Even in case a controller fails there is redundancy. To increase data security RAID 1 is often combined with some other RAID level. RAID 1 can only be performed with two hard drives. If there are more than two hard drives, RAID (0+1) will be performed automatically. Recommended Applications are accounting, payroll, financial apps, and other applications requiring very high availability.

1.2.3.2. RAID (0+1): Disk Striping with Mirroring
RAID (0+1) combines RAID 0 and RAID 1 - Mirroring and Striping. RAID (0+1) allows multiple drive failure because of the full redundancy of the hard drives. If there are more than two hard drives assigned to perform RAID-1, RAID (0+1) will be performed automatically. RAID (0+1) allows multiple drive failure and concurrent multiple drive rebuild. RAID (0+1) mostly will not appear in the list of RAID levels supported by the RAID controller; If you wish to perform RAID-1, the RAID controller will determine whether to perform RAID 1 or RAID (0+1). This will depend on the drive number that has been selected for the logical drive. Recommended application domains are imaging applications and low-end fileservers.

1.2.3.3: RAID-10: Mirroring and Striping
In fact nearly the same as RAID (0+1), but with that difference that the system will first mirror and then strip the data (in contrast to 0+1). Recommended for database server requiring high performance and fault tolerance.

1.2.4. RAID-2: Stripe Set
In RAID-2, each block of data is striped across data disks by a stripe unit of either a bit or a byte to allow parallel accesses, as in RAID-0. In RAID-2, all the drive spindles have to be synchronized using a single actuator or multiple coupled actuators, as the data bits must be read in parallel. Error detection and correction is provided using additional disks, whose data in created using a Hamming error-correcting code, to allow recovery from a single disk failure. Because a separate disk is used to store the hamming code RAID 2 is slow. RAID-2 performs best for large transfers in which the seek time is rapidly amortized. It would deliver a performance close to that of a single disk for a short transaction, and would perform poorly, compared with independent disks, in doing a number of concurrent short transactions. Modern disks use another kind of error correction code by themselves, and RAID-2 is now obsolete: no commercial implementations exist / not commercially viable.

1.2.5. RAID-3: Bit/Byte Stripe Set
RAID-3 is a performance-optimized alternative to RAID-2, except that a single parity bit is used instead of the Hamming code, thus reducing the number of error detection / correction disks to one. The difference is that a XOR-operations is used as the error correcting code. If the parity bit indicates an error, disk controllers can then identify the fault disk on their own without the need of an error-correcting code to locate the problem. The faulty disk can be then replaced and its data can be reconstructed using the remaining good disks and the parity disk. RAID3 is fast but it's speed is reduced by transferring small, non-continuous blocks. Good for large transfer sizes.

Recommended applications are video/image production and editing, live streaming, prepress apps and any application requiring high throughput.

1.2.6. RAID-4: Block/Sector Striping, Dedicated Parity Drive
RAID-4 is similar in configuration of that of RAID-3, except that the unit used for data striping is the block (sector) rather than the bit or the byte. With the large stripe unit, independent disk drive actuators are used and multiple small transactions can proceed concurrently, speeding up read transactions. However, multiple small independent writes must update the parity disk separately, which creates a sequential bottleneck. There is also a separate disk with checksum calculated by XOR-operations. Because this separate disk is a bottle-neck, RAID-4 isn't as fast as expected, at least when writing or in case of a disk failure (you don't need the XOR-values when reading).

1.2.7. RAID-5: Sector Striping, Distributed Checksums
(Striping with Interspersed Parity)

This RAID level combines features from RAID-0, RAID-3 and RAID-4, but it spreads the parity information. Data is striped in sector-sized units on several (at least three) disks. Error correction is achieved by using XOR-perations. The calculated checksum for parity isn't saved on a separate disk but distributed along with the data sectors (each of the data disks contains one of the parity blocks). Therefore there is no single disk acting as a bottle-neck. RAID-5 combines high security and good performance and is used for this reasons. Because the XOR-operation over large amounts of data is still something to take it's time RAID-5 is often implemented as hardware RAID with a special controller using a separate unit to calculate it.

RAID-5 is therefore the most suitable for transactioning and file and application server, WWW, e-mail, news, intranet servers and database applications. This level is also very well suited There is an overhead, however, associated with tracking where the relevant parity information is stored for a given data manipulation. A variety of strategies exist to evenly distribute data units and parity units; this illustration uses the left-symmetric layout.

1.2.7.1. RAID-53: High I/O Rates and Data Transfer Performance
RAID level 53 should actually be referred to as RAID level 03, since it is an implementation as a striped (RAID-0) array of disks with RAID-3 segments. RAID 53 uses the same fault tolerance as RAID-3. High data transfer rates are achieved with RAID-3's beneficious array segments, and high I/O speeds are typical because of RAID-0's striping.

1.2.8. RAID-6 and RAID-7
RAID-5 is the last level in the Berkeley-defined RAID architecture. Some modifications introduced by various implementations resulted in defining more levels. RAID-6 is an extension of level 5 in which the disks are arranged in a two-dimensional array, and the parity is determined in each dimension separately. RAID-7, in addition, uses dynamic mapping where each block of data does not always have to be stored in the same physical sector of a disk.

1.2.8.1. RAID-6: Independent Data disks with two independent distributed parity schemes


RAID-6 is essentially an extension of RAID level 5 which allows for additional fault tolerance by using a second independent distributed parity scheme (two-dimensional parity). Data is striped on a block level across a set of drives, just like in RAID 5, and a second set of parity is calculated and written across all the drives; RAID 6 provides for an extremely high data fault tolerance and can sustain multiple simultaneous drive failures. Perfect solution for mission critical applications

1.2.8.2. RAID-7: Optimized Asynchrony for High I/O Rates as well as High Data Transfer Rates


Overall write performance is 25% to 90% better than single spindle performance and 1.5 to 6 times better than other array levels. Host interfaces are scalable for connectivity or increased host transfer bandwidth and small reads in multi-user environments have very high cache hit rate resulting in near zero access times. The write performance improves with an increase in the number of drives in the array; access times decrease with each increase in the number of actuators in the array.

There are no extra data transfers required for parity manipulation. RAID-7 is a registered trademark of Storage Computer Corporation.

1.3. Levels Comparison
RAID is a set of hard disk drives that are being seen by the operating system as one logical drive. RAID-3 and RAID-5 are the two most popular levels of RAID. Which is better depends upon the application. RAID-5 is more suitable for database applications where small concurrent transactions can be supported efficiently. RAID-3, however, is most suited for large data accesses such as those found in scientific computing applications.

The next table will compare all different RAID levels more in-depth: Besides allowing to continue working with a failed disk, RAID features some additional techniques to restructure redundancy after a failure:

Hot Swapping: Change a disk while keeping the machine up to replace a failed device. The failed device is stopped, changed and a replacement is automatically setup and filled to contain the data once stored on the failed device. There is no need to shutdown a running server. Be aware that hot swapping needs special connectors to avoid electrical damage.

1.4. Possible approaches to RAID
All I/O controllers must perform a number of functions to complete an I/O transaction. It is the way in which these necessary functions are performed that separates the competition. There are two fundamental design differences: hardware RAID and software RAID.

The internal architectural features depend upon uninterrupted utilization of their internal caches, instruction pipelines and the predictive branch algorithms. Many of these performance enhancing architectural features are defeated when the host CPU must process interrupts from software RAID. This applies equally to software RAID and hardware RAID.

The controller based hardware solution
The hardware based system manages the RAID storage subsystem independently from the host operating system and presents to the host as a single disk. This way the host doesn't have to be aware of the RAID storage subsystems, since the intelligent controller manages the RAID storage subsystem independently from the host. Some manufacturers also point out a difference of internal and external RAID controllers. However, since their working is the same, we shan't discuss that here.

People often tend to make a huge difference between IDE and SCSI hardware RAID solutions: SCSI is suited perfectly for RAID applications, due to its very high bandwidth, reliability and performance, and since it can access multiple disks simultaneously. IDE RAID controllers, however, can only access one disk after another in writing data to the disks. High prices for SCSI solutions have resulted in a separation of the market: IDE solutions for home computers, SCSI solutions for business high-end servers.

Software RAID
Software RAID utilizes the host CPU (the processor on the motherboard) for any necessary processing functions. This method involves more interrupts to the host CPU, and many more CPU cycles. Software RAID indicates that the processing is NOT performed on the controller itself; in fact, there is often no controller present: everything can be emulated on the operating system by replacing the RAID controller by one or more device drivers. While this may save a small amount of cost, it is not desirable from an overall system performance standpoint. The performance characteristics of software RAID controllers can be somewhat deceiving. A software RAID controller may in fact appear to provide adequate performance when only one user is logged in to the console and is simply running a benchmark. However, the performance of these controllers in a real world environment often is inadequate. As the number of users on a sever increases, so does the load on the server CPU.

We already saw that IBM's Logical Volume Manager allows disk spanning. Now that we have discussed the various levels of RAID, we can easily see that LVM's disk spanning feature is an implementation of RAID-0, and that it is software implemented RAID, since the operating system (OS/2's native device drivers) take care of this RAID implementation. Notice that the same dangers apply as with RAID-0: if you are doing disk spanning of several partitions (and note that a partition is being seen as a hard disk), and one or more of the partitions fails, chances are really big that you loose all data. !!!!! HOE VULT LVM DE SCHIJVEN OP???? !!!!! LVM also offers some software implementation of hot swapping: the Expand Volume command in the menu bar. For more information about LVM, please refer to the online documentation in your OS/2 Warp system: Using the Logical Volume Manager, located in your Assistance Center folder.

Besides the techniques LVM uses for disk spanning and HPFS386's fault tolerance features, we'll discuss the only software solution available for OS/2. This is a shareware program called VRAID 2.3, which includes the device driver VRAID.FLT that intercepts OS2DASD.DMD (OS/2's general-purpose driver that offers device support for hard disk controllers) operations and combines non-removable disks to RAID arrays. And just a is the case with hardware RAID solutions, it presents the array of disks to the operating system as one (huge) ordinary disk.