System reliability, availability and robustness are often not well understood by system architects, engineers and developers. They often don't understand what drives customer's availability expectations, how to frame verifiable availability/robustness requirements, how to manage and budget availability/robustness, how to methodically architect and design systems that meet robustness requirements, and so on. The book takes a very pragmatic approach of framing reliability and robustness as a functional aspect of a system so that architects, designers, developers and testers can address it as a concrete, functional attribute of a system, rather than an abstract, non-functional notion.Figures.
Tables.
Preface.
Acknowledgements.
PART ONE RELIABILITY BASICS.
1 Reliability and Availability Concepts.
1.1 Reliability and Availability.
1.2 Faults, Errors and Failures.
1.3 Error Severity.
1.4 Failure Recovery.
1.5 Highly Available Systems.
1.6 Quantifying Availability.
1.7 Outage Attributability.
1.8 Hardware Reliability.
1.9 Software Reliability.
1.10 Problems.
1.11 For Further Study.
2 System Basics.
2.1 Hardware and Software.
2.2 External Entities.
2.3 System Management.
2.4 System Outages.
2.5 Service Quality.
2.6 Total Cost of Ownership.
2.7 Problems.
3 What Can Go Wrong.
3.1 Failures in the Real World.
3.2 Eight-Ingredient Framework.
3.3 Mapping Ingredients to Error Categories.