How to Succeed at Container Migrations on AWS
How to Succeed at Container Migrations on AWS
Migrating applications and infrastructure to the cloud requires careful planning and execution. Sizing infrastructure for production workloads during the planning phase may seem straightforward. But once the testing phase commences, sometimes it is discovered that infrastructure was sized incorrectly. This is where monitoring, a common component of cloud architectures, can provide great value. Monitoring plays a crucial role in ensuring stability, right-sizing, and optimal performance of cloud infrastructure and applications. Incorporating it early into a cloud migration allows engineers to identify problems in sizing assumptions and conduct performance tuning prior to transitioning production workloads to the new environment. This leads to properly sized and configured production infrastructure and applications that are more stable, performant, and have fewer issues.
There are a number of monitoring solutions available. Third-party monitoring solutions such as Datadog and SolarWinds are robust platforms for monitoring both infrastructure and application performance. They also have a wide array of tools that assist in debugging and diagnosing problems. However, the cost they can add to a company's budget may be prohibitive. On the other hand, AWS provides powerful native tools through the CloudWatch service that can effectively meet most monitoring needs without additional costs. While it may not be as powerful as some third-party solutions, CloudWatch's monitoring capabilities provide engineers with tools to ensure proper sizing and performance tuning of infrastructure and applications, leading to quicker and more streamlined migrations.
Infrastructure and Application Performance Monitoring during cloud migrations serve several crucial purposes:
Performance Baseline: Establishing performance metrics before, during, and after migration helps identify any degradation or improved performance.
Resource Optimization: Proper monitoring ensures that resources are appropriately sized, preventing over-provisioning and unnecessary costs.
Issue Detection and Resolution: Real-time monitoring allows for quick identification and resolution of problems that may arise during the migration process. Some monitoring software also have advanced debugging and root-cause analysis capabilities to assist in issue resolution.
Where third-party monitoring solutions are not an option, AWS CloudWatch offers a suite of tools that can be used to monitor your AWS environment effectively, regardless of the operating system.
The CloudWatch Agent is a powerful tool for collecting detailed system-level metrics and logs from EC2 instances and on-premises servers, supporting both Linux and Windows.
Key features include:
procstat
plugin to monitor application performance and resource usageTo install and configure the CloudWatch Agent on an EC2 instance using Terraform, one can just add a user_data
script section to the EC2 resource definition in your Terraform files as seen below. Here, we are assuming Ubuntu is the flavor of Linux being used:
resource "aws_instance" "example" {
ami = var.ami_id
instance_type = "t2.micro"
iam_instance_profile = aws_iam_instance_profile.cloudwatch_agent_profile.name
user_data = <<-EOF
#!/bin/bash
# Update and install prerequisites
apt-get update && apt-get install -y wget
# Install CloudWatch Agent
wget https://amazoncloudwatch-agent.s3.amazonaws.com/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
dpkg -i ./amazon-cloudwatch-agent.deb
# Configure CloudWatch Agent
cat <<EOT > /opt/aws/amazon-cloudwatch-agent/bin/config.json
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"metrics_collected": {
"cpu": {
"measurement": [
"cpu_usage_idle",
"cpu_usage_iowait",
"cpu_usage_user",
"cpu_usage_system"
],
"metrics_collection_interval": 60,
"totalcpu": false
},
"disk": {
"measurement": [
"used_percent",
"inodes_free"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"procstat": [
{
"exe": "executable_name",
"measurement": [
"cpu_usage",
"memory_rss",
"read_bytes",
"write_bytes",
"read_count"
],
"metrics_collection_interval": 30
}
]
}
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/syslog",
"log_group_name": "/var/log/syslog",
"log_stream_name": "{instance_id}",
"retention_in_days": 1825
}
]
}
}
}
}
EOT
# Start CloudWatch Agent
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json
EOF
tags = {
Name = "CloudWatchAgentUbuntuExample"
}
}
resource "aws_iam_role" "cloudwatch_agent_role" {
name = "cloudwatch_agent_role"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
Service = "ec2.amazonaws.com"
},
Action = "sts:AssumeRole"
}
]
})
}
resource "aws_iam_policy_attachment" "cloudwatch_agent_server_policy" {
name = "cloudwatch_agent_server_policy_attachment"
roles = [aws_iam_role.cloudwatch_agent_role.name]
policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
}
resource "aws_iam_policy_attachment" "cloudwatch_agent_admin_policy" {
name = "cloudwatch_agent_admin_policy_attachment"
roles = [aws_iam_role.cloudwatch_agent_role.name]
policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentAdminPolicy"
}
resource "aws_iam_instance_profile" "cloudwatch_agent_profile" {
name = "cloudwatch_agent_instance_profile"
role = aws_iam_role.cloudwatch_agent_role.name
}
For Windows instances, you can use a similar approach with PowerShell commands in the user_data
section. The PowerShell script must be contained within the <powershell>
and </powershell>
tags as seen below.
resource "aws_instance" "windows_example" {
ami = var.windows_ami_id
instance_type = "t2.micro"
iam_instance_profile = aws_iam_instance_profile.cloudwatch_agent_profile.name
user_data = <<-EOF
<powershell>
# Download and install CloudWatch Agent
$url = "https://amazoncloudwatch-agent.s3.amazonaws.com/windows/amd64/latest/amazon-cloudwatch-agent.msi"
$output = "$env:TEMP\amazon-cloudwatch-agent.msi"
Invoke-WebRequest -Uri $url -OutFile $output
Start-Process msiexec.exe -Wait -ArgumentList "/i $output /qn"
# Configure CloudWatch Agent
$config = @"
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "System"
},
"metrics": {
"metrics_collected": {
"cpu": {
"measurement": [
"% Idle Time",
"% User Time",
"% Processor Time"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"disk": {
"measurement": [
"% Free Space"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"memory": {
"measurement": [
"% Committed Bytes In Use"
],
"metrics_collection_interval": 60
},
"procstat": [
{
"exe": "executable_name",
"measurement": [
"cpu_usage",
"memory_rss",
"read_bytes",
"write_bytes",
"read_count"
],
"metrics_collection_interval": 30
}
]
}
},
"logs": {
"logs_collected": {
"windows_events": {
"collect_list": [
{
"event_name": "System",
"event_levels": [
"ERROR",
"WARNING"
],
"log_group_name": "System",
"log_stream_name": "{instance_id}",
"retention_in_days": 1825
}
]
}
}
}
}
"@
$config | Out-File -FilePath "$env:ProgramData\Amazon\AmazonCloudWatchAgent\config.json" -Encoding ASCII
# Start CloudWatch Agent
& $Env:ProgramFiles\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1 -a fetch-config -m ec2 -s -c file:$env:ProgramData\Amazon\AmazonCloudWatchAgent\config.json
</powershell>
EOF
tags = {
Name = "WindowsCloudWatchAgentExample"
}
}
resource "aws_iam_role" "cloudwatch_agent_role" {
name = "cloudwatch_agent_role"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
Service = "ec2.amazonaws.com"
},
Action = "sts:AssumeRole"
}
]
})
}
resource "aws_iam_policy_attachment" "cloudwatch_agent_server_policy" {
name = "cloudwatch_agent_server_policy_attachment"
roles = [aws_iam_role.cloudwatch_agent_role.name]
policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
}
resource "aws_iam_policy_attachment" "cloudwatch_agent_admin_policy" {
name = "cloudwatch_agent_admin_policy_attachment"
roles = [aws_iam_role.cloudwatch_agent_role.name]
policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentAdminPolicy"
}
resource "aws_iam_instance_profile" "cloudwatch_agent_profile" {
name = "cloudwatch_agent_instance_profile"
role = aws_iam_role.cloudwatch_agent_role.name
}
In both configurations, we've added the procstat
section under metrics_collected
. This allows you to monitor specific processes by their executable or process name. Replace executable_name
with the actual executable name or process name you want to monitor. Note that some of the names of the metrics are different between Windows and Linux per the AWS documentation.
It is also important to point out that the appropriate IAM roles and policies need to be attached to the EC2 instance so that the CloudWatch Agent can collect the desired metrics and report them back to AWS. Both the CloudWatchAgentServerPolicy
and the CloudWatchAgentAdminPolicy
policies need to be attached to the EC2 instance to allow it per this article.
CloudWatch Logs allows you to centralize logs from various AWS services and applications, crucial for troubleshooting and maintaining a comprehensive view of your system's health. Automating the processes of log collection to a centralized location is a best practice. It ensures that all data is collected and nothing is forgotten about. Retention policies can also be put in place to ensure that logs are not deleted before a certain date to comply with legal and regulatory requirements. In the above Terraform examples, we see the retention period set to 1825 days or 5 years, but this could be set to as little as 1 day if desired.
CloudWatch Dashboards provide a customizable view of your metrics and alarms. They are essential for visualizing the health and performance of your migrated applications and infrastructure. AWS automatically creates a dashboard for each EC2 instance in the EC2 portion of the AWS Console. However, these dashboards by default will not display any custom metrics that were configured in the CloudWatch Agent config file, such as metrics for specific processes/executables running in the EC2 instance. Additionally, it is a best practice to centrally locate all dashboards that monitor performance of infrastructure and applications. CloudWatch Dashboards allows you to do this by centrally locating all dashboards that are monitoring resources. These dashboards can be easily created in the AWS console. They can also be created using Terraform. Below is an example of a Terraform file that creates a dashboard in CloudWatch Dashboards automatically.
resource "aws_cloudwatch_dashboard" "example_dashboard" {
dashboard_name = "example-instance-dashboard"
dashboard_body = jsonencode({
widgets = [
{
type = "metric",
x = 0,
y = 0,
width = 12,
height = 6,
properties = {
metrics = [
["CWAgent", "cpu_usage_idle", "InstanceId", aws_instance.example.id],
["CWAgent", "cpu_usage_user", "InstanceId", aws_instance.example.id],
["CWAgent", "cpu_usage_system", "InstanceId", aws_instance.example.id]
],
period = 60,
stat = "Average",
region = var.region,
title = "CPU Usage"
}
},
{
type = "metric",
x = 0,
y = 6,
width = 12,
height = 6,
properties = {
metrics = [
["CWAgent", "mem_used_percent", "InstanceId", aws_instance.example.id]
],
period = 60,
stat = "Average",
region = var.region,
title = "Memory Usage"
}
},
{
type = "metric",
x = 0,
y = 12,
width = 12,
height = 6,
properties = {
metrics = [
["CWAgent", "used_percent", "InstanceId", aws_instance.example.id]
],
period = 60,
stat = "Average",
region = var.region,
title = "Disk Usage"
}
},
{
type = "metric",
x = 0,
y = 18,
width = 12,
height = 6,
properties = {
metrics = [
["CWAgent", "cpu_usage", "exe", "executable_name", "InstanceId", aws_instance.example.id],
["CWAgent", "memory_rss", "exe", "executable_name", "InstanceId", aws_instance.example.id]
],
period = 30,
stat = "Average",
region = var.region,
title = "Process Metrics"
}
}
]
})
}
resource "aws_cloudwatch_dashboard" "windows_dashboard" {
dashboard_name = "windows-instance-dashboard"
dashboard_body = jsonencode({
widgets = [
{
type = "metric",
x = 0,
y = 0,
width = 12,
height = 6,
properties = {
metrics = [
["CWAgent", "% Idle Time", "InstanceId", aws_instance.windows_example.id],
["CWAgent", "% User Time", "InstanceId", aws_instance.windows_example.id],
["CWAgent", "% Processor Time", "InstanceId", aws_instance.windows_example.id]
],
period = 60,
stat = "Average",
region = var.region,
title = "CPU Usage"
}
},
{
type = "metric",
x = 0,
y = 6,
width = 12,
height = 6,
properties = {
metrics = [
["CWAgent", "% Committed Bytes In Use", "InstanceId", aws_instance.windows_example.id]
],
period = 60,
stat = "Average",
region = var.region,
title = "Memory Usage"
}
},
{
type = "metric",
x = 0,
y = 12,
width = 12,
height = 6,
properties = {
metrics = [
["CWAgent", "% Free Space", "InstanceId", aws_instance.windows_example.id]
],
period = 60,
stat = "Average",
region = var.region,
title = "Disk Usage"
}
},
{
type = "metric",
x = 0,
y = 18,
width = 12,
height = 6,
properties = {
metrics = [
["CWAgent", "cpu_usage", "exe", "executable_name", "InstanceId", aws_instance.windows_example.id],
["CWAgent", "memory_rss", "exe", "executable_name", "InstanceId", aws_instance.windows_example.id]
],
period = 30,
stat = "Average",
region = var.region,
title = "Process Metrics"
}
}
]
})
}
In the list of metrics
, under properties
, one can place the names of any desired metrics to display in the dashboard, to include the names of any custom metrics configured in the CloudWatch Agent config file.
Establish Baselines: Before migration, document current performance metrics to compare post-migration.
Implement Gradual Monitoring: Start with basic metrics and gradually add more detailed monitoring as you progress.
Use Alarms Judiciously: Set up CloudWatch Alarms for critical metrics to receive immediate notifications of issues.
Leverage CloudWatch Insights: Use CloudWatch Logs Insights for ad-hoc analysis of log data during and after migration.
Automate Remediation: Where possible, use AWS Lambda in conjunction with CloudWatch Events to automate responses to common issues.
Monitor Specific Processes: Utilize the procstat
plugin to monitor critical application processes, ensuring they are running correctly and efficiently.
While third-party monitoring tools can offer additional features, AWS CloudWatch provides a robust, cost-effective monitoring solution during cloud migrations. By leveraging CloudWatch Agent with the procstat
plugin, CloudWatch Logs, and CloudWatch Dashboards, organizations can gain comprehensive visibility into their migrated applications and infrastructure with minimal costs impacts.
Remember, effective monitoring is not just about collecting data. It is about deriving actionable insights to ensure a smooth, efficient, and successful migration to AWS. With CloudWatch's capabilities for both Linux and Windows environments, including process-level monitoring, you can confidently monitor and manage your infrastructure throughout the migration process and beyond.
Useful Resources for CloudWatch Agent Setup and Configuration:
Read more about the latest and greatest work Rearc has been up to.
How to Succeed at Container Migrations on AWS
Ensuring properly sized infrastructure and app performance during migrations by using monitoring tools
Rearc at AWS re:Invent 2024: A Journey of Innovation and Inspiration
A People-First Vocation: People Operations as a Calling
Tell us more about your custom needs.
We’ll get back to you, really fast
Kick-off meeting