Info

This is a summary of Chapter 38 from the book Operating System: Three Easy Pieces by Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. This chapter focuses on Redundant Arrays of Inexpensive Disks (RAID), explaining different RAID levels, their advantages, trade-offs, and performance characteristics.

1. Introduction

Disks are often slow, limited in capacity, and vulnerable to failure. RAID systems address these issues by combining multiple disks to improve:

  1. Performance – Faster I/O operations via parallelism.
  2. Capacity – Larger storage by combining multiple disks.
  3. Reliability – Fault tolerance by introducing redundancy.

Key Questions:

  • How can RAID enhance disk storage?
  • What trade-offs exist between different RAID levels?

2. RAID Overview

RAID presents a transparent interface to the operating system, appearing as a single large disk. Internally, it consists of:

  • Multiple disks working together.
  • A RAID controller managing disk interactions.
  • Memory (DRAM/NVRAM) for caching and buffering.
  • Specialized hardware/software for redundancy management.

RAID Benefits:

  • Improves performance through parallel I/O.
  • Increases storage capacity by aggregating disks.
  • Enhances reliability via redundancy mechanisms.

3. RAID Fault Model

RAID assumes a fail-stop model:

  • Disks are either fully operational or completely failed.
  • Failures are immediately detectable by the RAID controller.
  • More complex failures (e.g., silent corruption, latent sector errors) are not initially considered but are relevant in real-world scenarios.

4. RAID Performance Metrics

RAID levels are evaluated based on:

  1. Capacity – Usable storage after redundancy.
  2. Reliability – Fault tolerance (number of disks that can fail).
  3. Performance – Impact on read/write speeds.

5. RAID Levels and Analysis

RAID 0 (Striping)

  • No redundancy (not technically a RAID).
  • Distributes data across disks to maximize performance.
  • Capacity:
  • Reliability: 0 (failure of one disk = data loss).
  • Performance:
    • Sequential Read/Write: (uses all disks).
    • Random Read/Write: .

RAID 1 (Mirroring)

  • Each disk has an exact copy on another disk.
  • Tolerates single disk failure (or more, if lucky).
  • Capacity: .
  • Reliability: 1 disk (or more, depending on failures).
  • Performance:
    • Reads: (can use either copy).
    • Writes: (must write to both copies).

RAID 4 (Parity-Based)

  • Uses one disk for parity calculations.
  • Recovers lost data using XOR calculations.
  • Capacity: .
  • Reliability: 1 disk failure tolerance.
  • Performance:
    • Sequential Read/Write: .
    • Random Read: .
    • Random Write: Severely bottlenecked by parity disk.

🔴 Small-Write Problem: Every write requires updating the parity disk, causing a bottleneck.


RAID 5 (Rotating Parity)

  • Same as RAID 4, but parity is distributed across disks.
  • Reduces the parity disk bottleneck, allowing better performance.
  • Capacity: .
  • Reliability: 1 disk failure tolerance.
  • Performance:
    • Sequential Read/Write: .
    • Random Read: .
    • Random Write: (better than RAID 4).

6. Summary Table

RAIDCapacityReliabilitySeq. ReadSeq. WriteRand. ReadRand. Write
RAID 00
RAID 11 (or more if lucky)
RAID 41 disk (Bottlenecked)
RAID 51 disk

7. Conclusion

  • RAID 0: Best performance, no reliability.
  • RAID 1: Best reliability, expensive (requires 2x storage).
  • RAID 4: Efficient storage, poor write performance.
  • RAID 5: Good balance of reliability and storage efficiency.

Choosing the right RAID level depends on whether performance, reliability, or storage efficiency is the priority.