Lambda Environment Variables' Impact on Coldstarts
Did you know that setting environment variables on a Lambda function could add over 20 ms to your coldstart times? In this post, I'll talk about how I discovered this and when it matters.
The discovery
I was looking in CloudTrail to determine when Lambda was calling sts:AssumeRole
to answer two questions I had about coldstarts:
- Is assuming your execution role included in
Init Duration
? - Is it done in parallel with other steps?
Unfortunately, since CloudTrail doesn't have millisecond granularity, I couldn't determine this from timing information alone. I did learn something interesting however; Lambda assumes your execution role multiple times 🤔.
TIL Lambda assumes your execution role 3 times at coldstart by looking at CloudTrail.
— David Behroozi (@rooToTheZ) March 8, 2024
1. To decrypt env vars with KMS
2. To pass to handler
3. To emit x-ray traces (if enabled)
I wonder why not assume it once? And whether 1 and 2 are done serially and impact coldstarts?
I can't answer why Lambda assumes the role multiple times, but I can answer whether assuming the role impacts coldstarts by testing it. Let's dig in!
Does having environment variables impact coldstarts?
To determine this, I invoked some Lambdas with and without environment variables.
Does it impact Init Duration?
No. To test, I ran the same function with and without environment variables and checked the delta in Init Duration
. The average was about 5 ms. Sometimes the function with environment variables was faster, sometimes the function without environment variables was faster. Since there isn't enough time to call both sts:AssumeRole
and kms:Decrypt
in 5ms and the winner alternates, I concluded assuming the execution role happens outside of Init Duration
. I ran the test every 3 hours for 24 hours (8 times).
Delta between functions for Init Duration (ms):
min | avg | p50 | max |
---|---|---|---|
1.42 | 5.365 | 5.13 | 10.08 |
Does it impact the total run time of the function during a coldstart?
Yes. To test, I created a Lambda that invoked Lambdas with and without environment variables. I measured the E2E latency of the invocations. To make sure making the ssl connection wasn't adding overhead, I made a call to lambda to prime the client, enabled connection reuse and randomized the order of the invocations. Every time the function without environment variables was faster. I ran the test every 3 hours for 24 hours (8 times).
Delta between functions for E2E latency (ms):
min | avg | p50 | max |
---|---|---|---|
12 | 43 | 21 | 107 |
Does it impact you?
Yes, but whether there is any action to be taken is up to you. If you have low traffic, rapid scale-out or are using a runtime like rust, go or llrt, 25ms might be >10% of your coldstart time. If your environment variables provide value keep them, if they aren't sensitive and don't change often, consider moving them to a file packaged with your code. If you have different variables per stage, you can set the lambda function name based on stage and use that to look up the variables in your packaged file.
Tip
If you are using the AWS JavaScript V3 SDK and see the AWS_NODEJS_CONNECTION_REUSE_ENABLED
environment variable remove it. The V3 SDK doesn't use it!. If you use the CDK NodejsFunction
construct, set the property awsSdkConnectionReuse: false
to remove it or it will always be set. If it is the only environment variable, you'll save time on coldstarts.
Some unanswered questions
If you look in CloudTrail you'll see 3 AssumeRole
calls for a function with environment variables.
The first two make some sense. The first is used to get credentials to call kms to decrypt the environment variables. The second is to get credentials to pass into your function. But there is a third 5 minutes after the invocation and there were no other invocations. Looking at what the access key is used for, it is calling kms:Decrypt
again.
So I have the following unanswered questions.
- Why doesn't lambda call
AssumeRole
once and reuse the credentials for both decrypting environment variables and passing to the handler? - Why does lambda call
kms:Decrypt
again 5 minutes after the invocation? - Could lambda set the session name on the
AssumeRole
call tofunctionName/logStreamName
instead offunctionName
to make it easier to trace which instance the credentials are for?
Conclusion
Be aware that setting environment variables in Lambda impacts your coldstart time by at least 20 ms. This applies to all runtimes. Decide whether the value environment variables provide is worth the cost and discontinue their use if it isn't.
Further Reading
- This repo: lambda-env-variables-coldstart-benchmark-cdk contains the code I used to test if you want to try and replicate the results.
- More details on the
AWS_NODEJS_CONNECTION_REUSE_ENABLED
variable not being used in the V3 JavaScript SDK. - My blog on the timeline of a Lambda request.