Ingest metrics from ECS

In the previous section, we have collected metrics from EKS, in this module, we will collect application and platform metrics from services running on ECS with the AWS Distro for OpenTelemetry collector.

Query application metrics

Execute the following command to see the raw metrics from the application.

curl -s "$(aws ssm get-parameter --name /petstore/petlistadoptionsmetricsurl --query Parameter.Value --output text)" | grep petlistadoptions

You should see outputs similar to this:

image

Configure AWS Distro for OpenTelemetry (ADOT) collector

  1. Build an ECR image with custom config

Let’s now use the AWS OpenTelemetry collector to scrape those metrics. Execute the following script to configure an ECR image with a custom configuration called a pipeline to collect application and ECS platform metrics.

WORKSPACE_ID=$(aws amp list-workspaces --alias observability-workshop | jq .workspaces[0].workspaceId -r)
ECR_REPOSITORY_URI=$(aws ecr create-repository --repository aws-otel-collector-petlistadoptions --query repository.repositoryUri --output text)
./resources/build-adot-ecr.sh $WORKSPACE_ID $ECR_REPOSITORY_URI $AWS_REGION

ADOT configuration walkthrough

As the previous section created an Amazon ECR image with a custom configuration. The entire configuration describes a pipeline. It defines a path the data follows in the collector starting from reception, then further processing or modification, and finally exiting the collector via exporters. Let’s actually walk through the configuration section. Execute the following script to view the configuration file.

cat /tmp/aws-otel-collector-petlistadoptions/config.yaml 

Our pipeline consist of four sections:

  • receivers
  • processors
  • exporters
  • service

Receivers

They will be in charge of getting telemetry data from the network. In this case, we have two receivers:

  • prometheus with a standard Prometheus configuration to scrape the application metrics
  • awsecscontainermetrics which reads task metadata and docker stats from Amazon ECS Task Metadata Endpoint and generates resource usage metrics (such as CPU, memory, network, and disk) from them
receivers:
  prometheus:
    config:
      global:
        scrape_interval: 15s
        scrape_timeout: 10s
      scrape_configs:
      - job_name: "test-prometheus-sample-app"
        static_configs:
        - targets: [ 0.0.0.0:80 ]
  awsecscontainermetrics:
    collection_interval: 15s

Visit the Prometheus scrape-config and ECS Container Metrics Receiver pages for more configuration options

Processors

A pipeline can contain sequentially connected processors. The first processor gets the data from one or more receivers that are configured for the pipeline, the last processor sends the data to one or more exporters that are configured for the pipeline. Getting data from the receiver, the processor section specify exactly which metrics we are expecting from ECS using filter, we apply transform to rename our metrics, and finally, we use resource to rename resource attributes which will be stored as metrics labels.

processors:
  filter:
    metrics:
      include:
        match_type: strict
        metric_names:
          - ecs.task.memory.utilized
          - ecs.task.memory.reserved
          - ecs.task.memory.usage
          - ecs.task.cpu.utilized
          - ecs.task.cpu.reserved
          - ecs.task.cpu.usage.vcpu
          - ecs.task.network.rate.rx
          - ecs.task.network.rate.tx
          - ecs.task.storage.read_bytes
          - ecs.task.storage.write_bytes

  metricstransform:
    transforms:
      - metric_name: ecs.task.memory.utilized
        action: update
        new_name: MemoryUtilized
      - metric_name: ecs.task.memory.reserved
        action: update
        new_name: MemoryReserved
      - metric_name: ecs.task.memory.usage
        action: update
        new_name: MemoryUsage
      - metric_name: ecs.task.cpu.utilized
        action: update
        new_name: CpuUtilized
      - metric_name: ecs.task.cpu.reserved
        action: update
        new_name: CpuReserved
      - metric_name: ecs.task.cpu.usage.vcpu
        action: update
        new_name: CpuUsage
      - metric_name: ecs.task.network.rate.rx
        action: update
        new_name: NetworkRxBytes
      - metric_name: ecs.task.network.rate.tx
        action: update
        new_name: NetworkTxBytes
      - metric_name: ecs.task.storage.read_bytes
        action: update
        new_name: StorageReadBytes
      - metric_name: ecs.task.storage.write_bytes
        action: update
        new_name: StorageWriteBytes

  resource:
    attributes:
      - key: ClusterName
        from_attribute: aws.ecs.cluster.name
        action: insert
      - key: aws.ecs.cluster.name
        action: delete
      - key: ServiceName
        from_attribute: aws.ecs.service.name
        action: insert
      - key: aws.ecs.service.name
        action: delete
      - key: TaskId
        from_attribute: aws.ecs.task.id
        action: insert
      - key: aws.ecs.task.id
        action: delete
      - key: TaskDefinitionFamily
        from_attribute: aws.ecs.task.family
        action: insert
      - key: aws.ecs.task.family
        action: delete

Exporters

Exporters are responsible for forwarding the data to one or multiple destinations. In this case, we use the awsprometheusremotewrite configuration to send metrics collected to an AMP Workspace.

exporters:
  awsprometheusremotewrite:
    aws_auth:
    endpoint: https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/<workspace-id>/api/v1/remote_write
      region: eu-west-1
      service: aps
    resource_to_telemetry_conversion:
      enabled: true
  logging:
    loglevel: debug

Service

Finally, service glues everything together and defines the pipeline

image

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [logging, awsprometheusremotewrite]
    metrics/ecs:
      receivers: [awsecscontainermetrics]
      processors: [filter]
      exporters: [logging, awsprometheusremotewrite]

Visit the OpenTelemetry collector design for more details.

Update ECS Service

  1. Navigate to the ECS Console, on the Task Definition menu, click on ServiceslistadoptionsservicetaskDefinition*, select the latest version, open the IAM task role in a new window and create a new revision.

image

  1. On the task definition edit page, scroll to the container definitions and click on aws-otel-collector

image

  1. On the Image field, replace public.ecr.aws/aws-observability/aws-otel-collector:latest with the ECR image URI created earlier.

image

The script we used to create the image outputs the image URI with an output similar to: The push refers to repository [123456789012.dkr.ecr.eu-west-1.amazonaws.com/aws-otel-collector-petlistadoptions].

Provide ECS Service with AMP write permissions

In order to sucessfully push metrics into the AMP workspace, ECS will need some IAM permissions.

  1. Go the previously opened tab about the Task IAM role. You can also search for listadoptionsservicetaskRole in the IAM console.

  2. Click Attach policies, search for AmazonPrometheusRemoteWriteAccess and proceed.

image

Deploy the new task definition version

  1. Go back to the ECS console. Select the *PetListAdoptions* cluster, and select the *listadoptionsservice*.

image

  1. Update the service with the latest task definition version and proceed to a new deployment.

image

Viewing logs for ADOT container on CW logs

Open the CloudWatch console and navigate to CloudWatch logs insights. Select the log group /ecs/PetListAdoptions and enter the following command

fields @message
| filter @logStream like 'aws-otel-collector'
| sort @timestamp desc
| limit 100

For more information on how to query logs using CloudWatch Logs insights, please refer to this section of the workshop

image

This concludes this section. To visualize the metrics collected, you may continue on to the next sections covering self managed Grafana and Amazon Managed Service for Grafana.