© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ARC303
Navigate the storm: Unleashing
controlled chaos for resilient
systems
Siriat Kongdee Laurent Domb
Principal Solutions Architect Chief Technologist, WW Federal
Amazon Web Services Financial Services
WW Chaos Engineering Lead
Amazon Web Services
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Top productivity challenges in software
engineering teams*
Putting out fires
Improve
business Improve
satisfaction software
quality
1
2 3
Source: *2022 Gartner Software Engineering Leaders Survey, Gartner® ,Market Guide For Chaos Engineering Tools, August 24, 2023.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used
herein with permission. All rights reserved.
The shared responsibility model for resilience
COMPUTE STORAGE DATABASE NETWORKING
AVAILABILITY ZONES
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The resilience of the cloud
How does the resilience
of the cloud impact your
application in the cloud?
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Resilience insights through AWS Fault Injection Service
COMPUTE STORAGE DATABASE NETWORKING
AVAILABILITY ZONES
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Fault Injection Service targets and actions
Compute Storage Networking Database Management
Amazon EC2 Amazon S3 Amazon VPC Amazon RDS Amazon
CloudWatch
Amazon EKS Amazon EBS Amazon DynamoDB AWS
Systems
Manager
Amazon ECS
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A resilience-focused mindset
Prior to an event During an event After an event
The ability to absorb shocks The ability to manage a The ability to get back to
and keep operating disruption as it unfolds normal as quickly as possible
Resilience Robustness Resourcefulness Rapid recovery
focused
Post-event
learning Adaptability/lessons learned
The ability to absorb new lessons after an event
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Source: The Sequence of NAIC Resilience Construct
Types of resilience experimentation
Manual experiments Automated experiments
CI/CD Canary Continuous
Ad hoc GameDay
pipeline release experimentation
Build and refine experiments Run approved and validated experiments
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Prevent fires through continuous resilience testing
Define the
objective
Learn and Select the
fine-tune
1 target
8 2
Execute Continuous Align
controlled
scenarios
resilience mental
maps
experiments testing
7 3
Ensure
operational
Address the
readiness
Define the knowns
for the
hypothesis
experiment
and 4
6 experiment
5
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Return on investment in chaos engineering
Return on
investment
Tier 1/Platinum
RTO: <2 hours
Tier 2/Gold RPO: <30 seconds
RTO: < 8 hours
RPO: < 4 hours
Tier 3/Silver
RTO: 24 hours
RPO: 24 hours
Tier 4/Bronze
RTO: 48+ hours
RPO: 72 hours
Effort
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Fault Injection Service: Flow
OVERVIEW
AWS
AWS Identity and FIS engine FIS safeguards Third party
Access Management
Start experiment Stop experiment
Amazon Monitoring
EventBridge
AWS Management AWS FIS
Console AWS resources (Targets)
Amazon
CloudWatch
Compute Databases Networking Storage
alarms
AWS CLI Experiment
template
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Our application for today
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Application architecture
Insights
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workshop focuses on new actions
Scenarios Amazon EC2 Amazon EKS Amazon ECS Amazon S3
Network
isolation
Scheduling Fill disk Pause I/O Latency Amazon
DynamoDB
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s get started
Workshop
https://catalog.workshops.aws/fis-v2
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Getting started with this workshop
You have access to an AWS account with any optional pre-provisioned
infrastructure and IAM policies needed to complete this workshop.
The AWS account is only available for the duration of this workshop.
You will lose access to the account once the workshop is complete.
Any optional pre-provisioned infrastructure is deployed to a specific AWS Region.
Make sure that you are working in this Region; other Regions are blocked.
Review the terms and conditions of the event. Do not upload any
personal or confidential information to the account.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Step 1: Sign in using your preferred method
https://catalog.workshops.aws/join
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Step 2: Enter the event access code
Each session has a unique code
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Step 3: Review terms
and join event
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Step 4: Get started with the workshop
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Step 5: Access AWS account
Access the AWS Management Console or
generate AWS CLI credentials as needed
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Event access code
2c9f-02c9a5-aa
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you! Please complete the session
survey in the mobile app
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.