11/8/17

Resilient Architectures - Voxxed Days Belgrade 2017

Matt Stine’s presentation on resilient architectures discusses the critical importance of designing robust systems capable of handling failures effectively, particularly within enterprise IT environments. They emphasize the severe impacts of software failures across various sectors like retail and aviation, underscoring the economic and operational consequences of such disruptions. Matt advocates for a shift from traditional mistake prevention strategies, which often hinder rapid issue resolution, towards embracing failures and enhancing system recoverability.

Matt outlines practical approaches for building resilience, including improving observability of systems, applying well-known resiliency patterns, and embracing chaos engineering to test systems under stress. They stress the importance of observability to understand the effects of actions on system health, utilizing service level indicators and objectives to define acceptable performance benchmarks. Moreover, leveraging resiliency patterns like timeouts, retries, bulkheads, and circuit breakers can help maintain system operations during failures, while chaos engineering proactively introduces faults to assess and improve system robustness.

In conclusion, by shifting focus from merely preventing failures to quick recovery, implementing comprehensive observability, and integrating structured chaos into system testing, enterprises can enhance their software architecture's resilience. This proactive approach ensures that systems are not only prepared to handle unexpected failures but are also continuously tested and improved upon, thus supporting consistent operational reliability and user trust.

Previous

Why Cloud-Native Enterprise Security Matters - O'Reilly Security New York 2017

Next

Serverless? Not so FaaS! - Devoxx UK 2017