CloudWatch Logs: distribution to multiple destinations

EDIT:
Amazon CloudWatch Logs now supports two subscription filters per log group.

Scenario

CloudWatch Logs enables you to centralize the logs from all of your systems, applications, and AWS services that you use. You can search them for specific error codes or patterns, filter them based on specific fields.

If you need more sophisticated solution you can forward logs to other systems. Common purposes are:

  • processing and alerting in log management system,
  • application observability and tracing in application performance monitoring system (example: Lambda instrumentation),
  • archiving,
  • reporting.

Mentioned use cases require different approaches:

  • real time analysis when you need results quickly, for example for alerts based on log entries.
  • Logs exports, for example for log archiving and reporting.

Exporting CloudWatch Logs to S3

You can export log data from your log groups to an Amazon S3 bucket using native AWS feature.
Log data can take up to 12 hours to become available for export. Note that exporting is a one time task, so if you want to do it regularny, you need to write you own automation. Example can be found here.

CloudWatch Logs subscription

You can use subscriptions to get access to a real-time feed of log events from CloudWatch Logs and have it delivered to other services such as a Amazon Kinesis stream, Amazon Kinesis Data Firehose stream, or AWS Lambda for custom processing, analysis, or loading to other systems. A subscription filter defines the filter pattern to use for filtering which log events get delivered to your AWS resource, as well as information about where to send matching log events to.

However, you can configure just one subscription filter per log group. To send logs to multiple destinations (i.e multiple Lambda functions or external systems) you need to somehow fan out logs massages. Below you can find two options how to do it.

Kinesis Data Stream

Kinesis Data Streams is an Amazon streaming service that collects and processes large streams of data records in real time. You can set it as CloudWatch Logs subscription destination. Different lambda functions can then independently read logs from stream and forward them to appropriate system.

CloudWatch Logs to Kinesis Data Stream

Event source mapping

How to trigger Lambda function that should read records from Kinesis stream? Best way is to set up an event source mapping. It is an AWS Lambda resource that reads from an event source and invokes a Lambda function. You can use event source mappings to process items from a Kinesis Data Stream and don’t invoke Lambda functions directly.
Said that, now you can see that picture above is a bit simplified, actually AWS Lambda (service) is responsible for invoking Lambda function.

Source: https://docs.aws.amazon.com/lambda/latest/dg/invocation-eventsourcemapping.html

Batching

By default, Lambda invokes your function as soon as records are available in the stream. If the batch it reads from the stream only has one record in it, Lambda only sends one record to the function. To avoid invoking the function with a small number of records, you can tell the event source to buffer records for up to 5 minutes by configuring a batch window. Before invoking the function, Lambda continues to read records from the stream until it has gathered a full batch, or until the batch window expires.

If your function returns an error, Lambda retries the batch until processing succeeds or the data expires. To avoid stalled shards, you can configure the event source mapping to retry with a smaller batch size, limit the number of retries, or discard records that are too old. To retain discarded events, you can configure the event source mapping to send details about failed batches to an SQS queue or SNS topic.

Alternative – Kinesis Data Firehose

Instead of using Lambda function to process logs and send them to appropriate destination you can use Kinesis Data Firehose. It is limited though to following destinations: Amazon S3, Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), and Splunk. Great benefit of this solution is that you don’t need to write your lambda code.
You configure your Kinesis Data Stream to send data to Kinesis Data Firehose, and it automatically delivers the data to the destination that you specified. You can also configure Kinesis Data Firehose to transform your data before delivering it. Kinesis Data Firehose calls the Kinesis Data Streams GetRecords operation once per second for each shard.

Kinesis Data Stream to Kinesis Data Firehose

You might ask why not send CloudWatch Logs directly to Kinesis Data Firehose, as it is supported via CloudWatch Logs subscription. You can, but again, Firehose supports only one destination.

Materials

Real-time Processing of Log Data with Subscriptions
Using AWS Lambda with Amazon Kinesis
Sending Data to an Amazon Kinesis Data Firehose Delivery Stream
Exporting Log Data to Amazon S3