What’s the difference between Reliability, Durability, and Availability for data storage system?
Posted on In QASome important concepts in distributed system like Hadoop distributed file system, Google file system and so on.
The difference between durability and availability is fairly simple. Durability is about what happens when all power goes out everywhere. Has all data been written to stable storage that doesn’t require power (e.g. disk/flash), in a form that allows it to be brought back and used? Availability is about what happens when there’s a partial failure – a disk, a node, a network. Does the system continue to operate and provide the same services it originally did?
Availability comes in multiple forms. To people who have worked on old-school HA – heartbeat, pairwise failover, address takeover, STONITH – it means system availability – the system as a whole continues to provide the original service. In a more recent CAP Theorem context it means node availability – the individual nodes (except those that have failed) continue to provide the original service. This precludes shutting down non-quorum nodes to preserve consistency, which is a common solution in the older HA world. This difference causes a lot of confusion, just like the C in CAP vs. the C in ACID, but it’s pretty well entrenched so you just have to keep the audience in mind when talking about availability.
Some people use “reliable” as a synonym for “available”. Some use it to distinguish system availability from node availability[1]. Some use it to mean ability to reach consensus despite faults[2] (basically C in CAP). Most people are just plain sloppy and don’t have any particular definition in mind (much like “fast” or “scalable”). Because there’s no consensus (heh) on its meaning, I’d suggest avoiding the term.
[1] 14 An Introduction to Distributed Systems http://webdam.inria.fr/Jorge/html/wdmch15.html
[2] Reliability of Distributed Systems http://www.cse.scu.edu/~jholliday/REL-EAR.htm