Update the Systems Manager Parameter to turn on error mode
in PayForAdoption
service.
aws ssm put-parameter --name '/petstore/errormode1' --value 'true' --overwrite
We are going to launch 5 instances of the trafficgenerator container to simulate scale
PETLISTADOPTIONS_CLUSTER=$(aws ecs list-clusters | jq '.clusterArns[]|select(contains("PetList"))' -r)
TRAFFICGENERATOR_SERVICE=$(aws ecs list-services --cluster $PETLISTADOPTIONS_CLUSTER | jq '.serviceArns[]|select(contains("trafficgenerator"))' -r)
aws ecs update-service --cluster $PETLISTADOPTIONS_CLUSTER --service $TRAFFICGENERATOR_SERVICE --desired-count 5
Now that the traffic load has increased, wait for 10 minutes and go to the ServiceLens console. You should see the PayForAdoption
service node with red color indicating trouble in the service with HTTP 500
errors.
When you click on the node, you will see that there are plenty of 500s being thrown from the service.
Click on View in Container Insights
and select the PayForAdoptions
service. You will see the metric widget showing the memory fluctuation in the service which is contributing to the issue.
Now click on the View traces
button to see traces from that service. In the next screen, select Node status
from the Filter type
list and tick the checkbox that says Faults
and click the Update node status filter
button
Scroll all the way down, and click on one of the traces. The list of traces you see are all the ones that contain HTTP 500 error
In the next screen, you can see the segments timeline showing the 500
response code from ‘payforadoption
service as shown below
CloudWatch ServiceLens can automatically correlate traces, logs and metrics to identify root cause. Scrolling all the way down to the Logs section will reveal the actual error from the application as shown below. You can also dive deep into the Log data and analyze it by clicking on View in CloudWatch Logs Insights
button.
The following GIF shows the sequence of actions
PETLISTADOPTIONS_CLUSTER=$(aws ecs list-clusters | jq '.clusterArns[]|select(contains("PetList"))' -r)
TRAFFICGENERATOR_SERVICE=$(aws ecs list-services --cluster $PETLISTADOPTIONS_CLUSTER | jq '.serviceArns[]|select(contains("trafficgenerator"))' -r)
aws ecs update-service --cluster $PETLISTADOPTIONS_CLUSTER --service $TRAFFICGENERATOR_SERVICE --desired-count 1
Update the Systems Manager Paramter to turn off error mode
in PayForAdoption
service.
aws ssm put-parameter --name '/petstore/errormode1' --value 'false' --overwrite
Wait for a few minutes and check out the CloudWatch Container Insights
screen to observe that the highly fluctuating memory utilization behavior is now stopped.