Troubleshoot memory issue

Enable errormode for the PayForAdoptions service

  1. Navigate to your Cloud9 terminal and execute the following commands:

This command updates the Systems Manager Parameter to turn on error mode in the PayForAdoption service.

aws ssm put-parameter --name '/petstore/errormode1' --value 'true' --overwrite

Increase the traffic generator count for additional load

  1. Execute the following commands in the terminal:

This script launches 5 instances of the trafficgenerator container to simulate scale.

PETLISTADOPTIONS_CLUSTER=$(aws ecs list-clusters | jq '.clusterArns[]|select(contains("PetList"))' -r)
TRAFFICGENERATOR_SERVICE=$(aws ecs list-services --cluster $PETLISTADOPTIONS_CLUSTER | jq '.serviceArns[]|select(contains("trafficgenerator"))' -r)
aws ecs update-service --cluster $PETLISTADOPTIONS_CLUSTER --service $TRAFFICGENERATOR_SERVICE --desired-count 5

Let’s find out what happens

  1. After 10 minutes have passed, navigate to the ServiceLens console.

Since the traffic load has increased, you should see the PayForAdoption service node with red coloring. This indicates trouble in the service with HTTP 500 errors.

Issue with PayForAdoptions

  1. Click on the node with red coloring.

When you click on the node, you will see that there are plenty of 500s being thrown from the service.

Trace metrics

  1. Click on View in Container Insights and select the PayForAdoptions service.

You will see the metric widget showing the memory fluctuation in the service. This is contributing to the performance issue.

Container Insights Mem Issue

  1. Click on the View traces button to see traces from that service.
  2. Select Node status from the Filter type list, and tick the checkbox that says Faults.
  3. Click the Update node status filter button.

Filter traces

  1. Scroll down to the bottom and click on one of the traces.

The list of traces you see are all the ones that contain HTTP 500 error

Trace list

In the screen, you should see the segments timeline showing the 500 response code from the ‘payforadoption service as shown below.

Look at logs

CloudWatch ServiceLens can automatically correlate traces, logs and metrics to identify the root cause.

  1. Scroll down to the Logs section.

Here you will see the actual error from the application as shown below. You can also dive deep into the Log data and analyze it by clicking on View in CloudWatch Logs Insights button.

Look at logs

The following GIF shows the sequence of actions. Look at logs

Revert the application to normal behavior

Set the traffic generator count to 1 for regular load

  1. Navigate back to Cloud9 and execute the following commands in the terminal:
PETLISTADOPTIONS_CLUSTER=$(aws ecs list-clusters | jq '.clusterArns[]|select(contains("PetList"))' -r)
TRAFFICGENERATOR_SERVICE=$(aws ecs list-services --cluster $PETLISTADOPTIONS_CLUSTER | jq '.serviceArns[]|select(contains("trafficgenerator"))' -r)
aws ecs update-service --cluster $PETLISTADOPTIONS_CLUSTER --service $TRAFFICGENERATOR_SERVICE --desired-count 1

Disable errormode to return to normal behavior

  1. Execute the following commands in the terminal:

This command updates the Systems Manager Paramter to turn off error mode in PayForAdoption service.

aws ssm put-parameter --name '/petstore/errormode1' --value 'false' --overwrite
  1. Wait for a few minutes, then check out the CloudWatch Container Insights screen.

You should see that the highly fluctuating memory utilization behavior is now stopped.

Container Insights normal behavior

This concludes the Load Test & Troubleshoot module.