Cost Optimization Strategies for AWS Serverless Applications in 2025
6 min read
Serverless promised “pay only for what you use.” Then your AWS bill arrived, and it turns out you are using a lot more than you thought. Hidden Lambda invocations, noisy logs, and inefficient memory settings can quietly eat your budget.
This post walks through practical, production‑tested ways to optimize costs for AWS serverless workloads in 2025 - right‑sizing Lambda, using Graviton2, batching, event filtering, and keeping CloudWatch spend under control.
Understand Where Serverless Costs Come From
For most serverless architectures, these are the main cost drivers:
Lambda compute (GB‑seconds) and request count
API Gateway or AppSync requests
CloudWatch Logs and metrics
Event sources (SQS, Kinesis, DynamoDB Streams)
Data transfer between regions or the public internet
The goal is not just “spend less,” but spend efficiently - same or better performance for less money.
1. Right‑Size Lambda with Power Tuning
Lambda pricing scales with memory and execution time. More memory costs more per millisecond but can reduce total duration, sometimes making higher memory cheaper overall.
Use AWS Lambda Power Tuning
AWS Lambda Power Tuning is a Step Functions state machine that benchmarks your function across multiple memory sizes and recommends the best configuration for cost, speed, or a balance of both.
# Clone and deploy Lambda Power Tuning
git clone https://github.com/alexcasalboni/aws-lambda-power-tuning.git
cd aws-lambda-power-tuning
sam deploy --guided
Run a tuning execution:
aws stepfunctions start-execution \
--state-machine-arn arn:aws:states:REGION:ACCOUNT:stateMachine:powerTuningStateMachine \
--input '{
"lambdaARN": "arn:aws:lambda:REGION:ACCOUNT:function:my-function",
"powerValues": [128,256,512,1024,1536,2048,3008],
"num": 50,
"payload": {},
"parallelInvocation": true,
"strategy": "cost"
}'
When tuning is done, open the visualization URL from the execution output to see cost vs duration per memory setting and choose the optimal value.
Apply the Recommended Memory in IaC
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as cdk from "aws-cdk-lib";
// CDK – apply the recommended memory size
const fn = new lambda.Function(this, "OptimizedFn", {
runtime: lambda.Runtime.PYTHON_3_12,
handler: "app.handler",
code: lambda.Code.fromAsset("lambda"),
memorySize: 1769, // from Power Tuning recommendation
timeout: cdk.Duration.seconds(10),
});
2. Use Graviton2 (ARM64) for Up to 20% Savings
Migrating Lambda to ARM‑based Graviton2 can reduce compute cost while often improving performance, especially for CPU‑bound workloads.
import * as lambda from "aws-cdk-lib/aws-lambda";
const gravitonFn = new lambda.Function(this, "GravitonFn", {
runtime: lambda.Runtime.NODEJS_20_X,
architecture: lambda.Architecture.ARM_64,
handler: "index.handler",
code: lambda.Code.fromAsset("lambda"),
memorySize: 1024,
});
Before switching all functions:
Verify your language and dependencies support ARM64
Run load tests to confirm latency and error rates
3. Reduce Unnecessary Invocations with Event Filtering
Every Lambda invocation has a cost, even if the handler just returns immediately. Event filtering lets you drop irrelevant events before they trigger Lambda, which lowers both compute and request costs.
S3 Event Filtering
import * as s3 from "aws-cdk-lib/aws-s3";
import * as s3n from "aws-cdk-lib/aws-s3-notifications";
bucket.addEventNotification(
s3.EventType.OBJECT_CREATED,
new s3n.LambdaDestination(processImageFn),
{
prefix: "uploads/",
suffix: ".jpg", // ignore non‑image files
}
);
DynamoDB Streams Filtering
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as lambdaEventSources from "aws-cdk-lib/aws-lambda-event-sources";
const streamSource = new lambdaEventSources.DynamoEventSource(table, {
startingPosition: lambda.StartingPosition.LATEST,
batchSize: 100,
filters: [
lambda.FilterCriteria.filter({
eventName: lambda.FilterRule.isEqual("INSERT"),
dynamodb: {
NewImage: {
status: { S: lambda.FilterRule.isEqual("PENDING") },
},
},
}),
],
});
processorFn.addEventSource(streamSource);
By filtering to only INSERT events with status = PENDING, you avoid paying for updates and deletes that your logic does not care about.
4. Batch Workloads with SQS and Kinesis
High‑volume event processing can explode your Lambda request count. Batching multiple messages per invocation drastically reduces requests and can improve overall cost efficiency.
SQS + Lambda Batching
import * as sqs from "aws-cdk-lib/aws-sqs";
import * as lambdaEventSources from "aws-cdk-lib/aws-lambda-event-sources";
import * as cdk from "aws-cdk-lib";
const queue = new sqs.Queue(this, "OrdersQueue");
ordersProcessorFn.addEventSource(
new lambdaEventSources.SqsEventSource(queue, {
batchSize: 10, // up to 10 messages/invoke
maxBatchingWindow: cdk.Duration.seconds(5),
})
);
# lambda/orders_processor.py
import json
def handler(event, context):
for record in event["Records"]:
msg = json.loads(record["body"])
process_order(msg)
def process_order(message: dict) -> None:
# Your business logic here
...
For streaming workloads, configure larger batch sizes and windows to amortize invocation overhead across more records with Kinesis or DynamoDB Streams.
5. Control CloudWatch Logs Spend
It is common to spend more on CloudWatch Logs than on the Lambda itself if you log verbosely and retain logs forever.
Set Log Retention
import * as logs from "aws-cdk-lib/aws-logs";
import * as cdk from "aws-cdk-lib";
new logs.LogGroup(this, "FnLogs", {
logGroupName: `/aws/lambda/${fn.functionName}`,
retention: logs.RetentionDays.ONE_WEEK, // not forever
removalPolicy: cdk.RemovalPolicy.DESTROY,
});
Use Log Levels
from aws_lambda_powertools import Logger
import os
logger = Logger(level=os.getenv("LOG_LEVEL", "INFO"))
def handler(event, context):
logger.debug("Very chatty debug log") # omitted in production by default
logger.info("Handling request", extra={"records": len(event.get("Records", []))})
Set LOG_LEVEL=ERROR in production for very high‑volume functions to keep logs and bills under control.
6. Manage Concurrency to Avoid Cost Spikes
Lambda scales automatically, which is great for traffic spikes—but also great for unexpected cost spikes. Reserved concurrency and SQS configuration help control this.
Reserved Concurrency
import * as lambda from "aws-cdk-lib/aws-lambda";
const paymentFn = new lambda.Function(this, "PaymentFn", {
runtime: lambda.Runtime.PYTHON_3_12,
handler: "payment.handler",
code: lambda.Code.fromAsset("lambda"),
reservedConcurrentExecutions: 50, // cap to protect downstreams and budget
});
When Lambda is triggered by SQS, ensure the queue’s visibility timeout is greater than the function’s max runtime to avoid duplicate processing and wasted invocations.
7. Use Savings Plans for Steady Workloads
If you run substantial Lambda and Fargate workloads continuously, Compute Savings Plans can reduce cost by 10-20%+ in exchange for a 1 or 3‑year commitment.
Use Cost Explorer to identify steady monthly compute spend
Start with a conservative 1-year plan at ~50–60% of your baseline
Re‑evaluate once you’ve applied the optimizations in this post
Savings Plans apply automatically to eligible usage; no code changes are required.
8. Data Transfer and Architecture Choices
Not all serverless cost is Lambda. Cross‑region and internet egress can be surprisingly expensive.
To control this:
Keep services that talk to each other in the same region when possible
Avoid routing traffic through the public internet when a VPC endpoint is available
Store and process data in the same region to minimize cross‑region transfer
For data‑heavy workloads, consider pre‑processing or compressing objects before Lambda reads them, or push work into cheaper batch systems where appropriate.
9. FinOps Guardrails for Serverless
Technical optimizations work best when paired with basic financial governance.
Tag Everything
import * as cdk from "aws-cdk-lib";
cdk.Tags.of(this).add("Environment", "production");
cdk.Tags.of(this).add("Team", "payments");
cdk.Tags.of(this).add("CostCenter", "finops");
Create Budgets and Alerts
Use AWS Budgets or Cost Anomaly Detection to alert when:
Actual monthly cost exceeds 80% of budget
Forecasted spend is trending above your target
This gives you time to react before the bill hits.
Putting It All Together
A cost‑optimized serverless workload in 2025 typically includes:
Lambda functions tuned with Power Tuning and running on Graviton2
Event filtering on S3/DynamoDB/SQS to avoid unnecessary invocations
Batching for SQS/Kinesis consumers
Log retention set to 7-30 days and log levels tuned per function
Reserved concurrency on critical, high‑traffic functions
Savings Plans for the steady portion of compute usage
Tagging, budgets, and dashboards to track cost over time
Start with your top three most expensive functions or services, measure impact in Cost Explorer, then iterate. Small changes at high volume often yield the biggest savings.
If you use similar cost optimization patterns in your serverless workloads, share your experiences and questions in the comments - would love to learn what has worked (or failed) in your environment.