What problems does the AWS Fargate right sizing dashboard solve?
The metrics collected by Containers Insights for ECS (which includes support for Fargate) isn’t granular enough to allow tracking single tasks. The metrics available (i.e. CpuReserved, CpuUtilized, MemoryReserved, MemoryUtilized) are all aggregated and averaged at the task definition family level. The assumption here is that all running tasks within the same task definition family are evenly balanced (behind an ECS service) when they scale out (so averaging their configuration and consumption is acceptable). This holds true in many situations but it doesn’t always happen. Here are some scenarios that challenge that assumption:
- The same task definition behind an ECS service being used for both dev and prod environments: often times the load on development environments is much lower than that of production environments. Averaging these inputs may provide a balanced number which doesn’t allow to capture if the dev environment is way under-utilized and/or if the prod environment is way over-utilized
- The same task definition doesn’t use an ECS service and is rather being used for batch type of workloads with different resource consumption profiles. Again, averaging these inputs may provide a misleading number which doesn’t allow to capture the need to have different task definitions for different workloads profiles
- There could be situations where the running tasks part of an ECS service behind a load balancer are not evenly utilized. This could be because of a particular application pattern or because of some configuration issues. This dashboard isn’t meant to be a problem determination tool but it could be a way to spot problems other than inefficiencies
The dashboard tries to respond to these user stories
As a user I would like to,
- see the total number of running Fargate tasks in the cluster
- see the total waste (aggregate of all running Fargate tasks) relative to actual consumption based on memory usage
- see the total waste (aggregate of all running Fargate tasks) relative to actual consumption based on CPU usage
- see the top 10 running Fargate tasks order by memory waste (i.e. the 10 tasks with the highest memory optimization - opportunity)
- see the top 10 running Fargate tasks ordered by CPU waste (i.e. the 10 tasks with the highest CPU optimization opportunity)
- see the list of all running Fargate tasks ordered by tasks with the most waste based on cpu usage and memory usage
- see the list of all running tasks with all the configuration and consumption details ordered by the task definition family name
Execute the following commands to create the custom dashboard
These commands will create a custom CloudWatch dashboard that contains several widgets showing cluster resource performance.
PAYFORADOPTION_ECS_CLUSTER=$(aws cloudformation describe-stack-resources --stack-name $STACK_NAME | jq -r '.StackResources | select(.ResourceType == "AWS::ECS::Cluster") | select(.LogicalResourceId | contains("PayForAdoption")) | .PhysicalResourceId')
sed -i "s~CLUSTERNAME~$PAYFORADOPTION_ECS_CLUSTER~" ./resources/fargate-resize-dashboard.json
aws cloudwatch put-dashboard --dashboard-name fargate-right-sizing --dashboard-body file://./resources/fargate-resize-dashboard.json
Now go to CloudWatch dashboards to see the newly created dashboard.