What should you do next?

You are on-call for an infrastructure service that has a large number of dependent systems. You receive an alert indicating that the service is failing to serve most of its requests and all of its dependent systems with hundreds of thousands of users are affected. As part of your Site Reliability Engineering (SRE) incident management protocol, you declare yourself Incident Commander (IC) and pull in two experienced people from your team as Operations Lead (OLJ and Communications Lead (CL).

What should you do next?
A . Look for ways to mitigate user impact and deploy the mitigations to production.
B . Contact the affected service owners and update them on the status of the incident.
C . Establish a communication channel where incident responders and leads can communicate with each other.
D . Start a postmortem, add incident information, circulate the draft internally, and ask internal stakeholders for input.

Answer: C

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments