Monitor AWS Network Traffic with VPC Flow Logs Using Cloudwatch and AWS CDK
Flow logs are the native network logging layer for AWS. These logs can be setup specifically for logging IP traffic on subnets, network interfaces, or VPCs. VPC flow logs in particular contain a vast amount of IP traffic information and data points for our resources that can be leveraged for:
Monitoring boundaries for networks and AWS accounts
Detecting anomalous network activity
Catching unintentional cross-region data transfers early (to avoid unnecessary costs)
Identifying system optimizations based on AZ distribution
Performing various network traffic flow optimizations
Identify relevant data points with example use cases
Query traffic logs for specific data points.
VPC flow log example architecture
In the following example, a flow log publishes all IP resource traffic in a VPC to a CloudWatch log group:
The flow log needs an IAM role with write-access for publishing the logs to CloudWatch.
How CloudWatch organizes VPC flow log data
The way VPC flow logs are published to CloudWatch is in three steps:
A log group is created for archiving all flow log data
A log stream is created for each resource being monitored
Log events are created within each log stream with custom data points for IP traffic
Basically, a log group consists of log streams which consist of log events.
Deploy VPC flow logs publishing to CloudWatch logs for near real-time analytics
Now that we have an idea about how flow logs work and how we can find our network data in CloudWatch, let's build some flow logs!
Deploying VPC flow logs with AWS CDK
AWS CDK allows us to write cloud application resources through code in a supported language (Typescript, Python, Go, etc.) which then gets provisioned/deployed by AWS CloudFormation in the background. We often use constructs, which are basic cloud components that can be made of one or more resources, in order to build our application.
There are a couple ways we can set up flow logs with AWS CDK for Python:
The default AWS CDK FlowLog construct is a high-level flow log resource we can add to a CDK stack by instantiating it
However, these options use the default log format and don't allow for setting a custom log format, which is a crucial feature for choosing specific data fields in our network traffic that we want the logs to output.
The solution is to build a custom AWS CDK construct with the lower-level construct CfnFlowLog since it includes a log_format attribute. By building a custom construct based on CfnFlowLog, we can:
Customize the log format
Modularize the construct into its own file
Reuse the construct for any given VPC in an AWS CDK stack
Let's look at a custom FlowLog construct that implements this feature.
resource_id – Associates the flow log to a given VPC by ID
deliver_log_permission_arn – Associates the IAM role by ARN (for granting write permission to the CloudWatch log group we created)
log_group_name – Identifies the log group to write flow logs to
log_format – Specifies the custom log format that appears on log events. Here is a full list of data fields you can customize the format with
Importing the FlowLog construct in a stack
A stack is a unit for deployment that is provisioned by AWS CloudFormation and can be added to an app for the stack to be deployed to AWS. You can imagine an app consisting of multiple stacks which consist of multiple resource constructs. We can import the custom flow log construct we just made into a stack to prepare it for deployment.
from aws_cdk import Stack, aws_ec2 as ec2
from flowlog import FlowLog
classMyStack(Stack)def__init__(self, scope: Construct,id:str)super().__init(scope,id) self.vpc = ec2.Vpc(self,"MyVPC") self.flow_log = FlowLog(self,"MyFlowLog", vpc=self.vpc)
We can import the FlowLog construct from a local Python file (flowlog.py)
A new VPC construct is instantiated and passed as an argument to the flow logs for specifying the VPC for generating log data
The flow log construct is instantiated
Working with VPC flow log data fields
Identifying relevant data fields
There are plenty of data fields we can use for customizing our log format, as listed here. Recognizing the right fields depends on your use case, as some fields may be more useful than others.
Here is a collection of data fields you may find useful for network traffic monitoring and security with sample use cases:
Field
Summary
Example use cases
account-id
The AWS account ID of the owner of a source network interface.
Identifying AWS users so that only trusted users are accessing specific resources from the VPC.
interface-id
The ID of the network interface (resource) whose IP traffic is being recorded.
Identifying which resource is being monitored in a flow log record.
region
The region that contains the network interface for which traffic is recorded.
Evaluating whether region-to-region transfers are being made which generally results in high latency, bogged-down bandwidth, and high costs.
subnet-id
The ID of the subnet that contains the network interface whose IP traffic is being recorded.
Ensuring resources are running in their proper subnets.
srcaddr
Source address of incoming traffic or IP address of network interface for outgoing traffic.
Verifying only trusted resources are sending data out or detecting incoming traffic as possible threats or unknown sources.
dstaddr
Destination address of outgoing traffic or IP address of network interface for incoming traffic.
Ensuring resources are only accessing verified destination addresses, or only trusted resources are being accessed.
srcport
Source port of traffic.
Ensuring that only trusted applications on a local resource are being used for accessing external resources, or vice versa.
dstport
Destination port of traffic.
Ensuring that only trusted applications on an external resource are accessing local resources, or vice versa.
flow-direction
Whether the traffic flow is ingress (incoming) or egress (outgoing).
Identifying only outgoing traffic by specifying egress within a CloudWatch Log Insights query.
traffic-path
A specific numerical value representing the path that egress traffic takes to its destination.
Verifying resources are using intended paths to their destination, such as a VPC gateway endpoint instead of a NAT gateway to lower S3/DynamoDB access costs.
action
Whether the traffic is accepted (ACCEPT) or rejected (REJECT).
Diagnosing traffic that may not be allowed by security groups or network ACLs, or packets arrived after a connection was closed.
log-status
Whether data logged normally (OK), no network traffic to/from the network interface (NODATA), or some flow log records were skipped (SKIPDATA).
Ensuring traffic logging is successful, detecting if resources are unable to transfer data with each other.
An example of a log event using the data fields above as a custom log format is:
CloudWatch Log Insights can be used to query CloudWatch log events with SQL-like syntax. VPC flow logs can aggregate CloudWatch log events very quickly, so querying can be very useful for specifying a log group's log events that we are interested in viewing based on their data points.
For example, let's say we want to see recent outgoing traffic from a specific user's resources. Let's look for the 20 most recent log events where the user's account ID is 107530157253 and the traffic is outgoing or egress. We can run the following query:
fields@timestamp,@message, accountId as ID, flowDirection
| sort @timestampdesc| filter ( ID ='107530157253'and flowDirection ="egress")|limit20
fields specifies the values that are imported from a log event, where @message is the log data
accountId is a given value from the log event referenced in the query as ID
flowDirection specifies whether traffic is incoming (ingress) or outgoing (egress)
filter gets log events that match one or more conditions
Conclusion
Enabling VPC flow logs that publish to CloudWatch logs has a multitude of benefits with the various data fields provided. Being able to directly monitor resources in a VPC and query data through flow logs can be a valuable addition to your networking toolset.
References
A full list of log event data fields can be found in the AWS Documentation here
Further detail on CloudWatch log insights query syntax can be found here, along with sample queries here
Latest Articles
Read more about the latest and greatest work Rearc has been up to.