Lambda cold start times in .NET6: ARM vs. x64

Why cold start time matters

Cold start time can be an important metric to optimize in a serverless application, like those using AWS Lambda. I maintain some functions with HTTP endpoints where quick response time is crucial for a good user experience. It convinced me to take a closer look and measure cold start times of a .NET application hosted on AWS Lambda service.

In the world of web performance, experts often quote that anything that happens in time shorter than 100 milliseconds feels instant to a human. But serverless functions that are not used frequently can by far exceed that threshold. Mikhail Shilkov measured cold starts in Azure Functions. His results prove that a cold start of an Azure Function written in C# and hosted in Azure’s Consumption Plan can exceed 10 seconds. That is a long request to wait for.

Does choice of a CPU architecture impact cold start time in AWS?

I decided to measure an impact of a CPU architecture on the cold start time of Lambda functions hosted in AWS. The choice between the default x86_64 (=x64 in Microsoft’s terminology) or ARM64 is one of the first choices that have to be made when creating an AWS Lambda instance. So, does it matter which one we choose if our priority is to minimize the latency?

Here are some assumptions that I made in my measurements:

The function code used in tests does almost nothing beyond proving that it started successfully and can handle a request. It’s mostly a boilerplate code that comes with a project Blueprint named Annotations Framework (preview).
I tested a range of memory settings between 128 MB and 3008 MB. This is because I know that in AWS Lambda adding more memory, proportionally increases the amount of CPU. I couldn’t assign more memory than 3008 MB to any function due to limits on my account. But as we’ll see in the results, returns are diminishing quickly. So, going beyond 3008 MB is unlikely to change much.
I tested only .NET 6 runtime (which is the current LTS version at the time).
I am interested in cold start time impact. But I really measure the end-to-end response time for lambdas, as seen on the client side. This means that the measured time includes:
- a round trip time to the AWS data center where Lambdas are deployed (I tested it to be about 30ms).
- cold start time, when it occurs. Variance in that one is what is interesting.
- function’s own execution time, which I attempted to minimize by not giving it almost anything to do.

The results: cold start time in .NET6

Let me start with visualizing all collected data points:

A chart showing a relation between AWS Lambda's inactivity time and total time needed to handle a subsequent request — A chart showing a relation between .NET 6 AWS Lambda’s inactivity time and total time needed to handle a subsequent request.

Some conclusions I draw from the above chart:

Cold starts are observed after about 5 minutes of function’s inactivity. They result in noticeably longer response times (as anticipated).
The response time during cold starts is heavily impacted by the choice of amount of memory. Functions with only 128 MB of memory require about 1.5 seconds to serve the first request after a period of inactivity. Functions with 1024 MB of memory will typically require just 0.5 second.
Requests to functions hosted on ARM64 architecture appear to be slightly slower than their counterparts on average. But it might be misleading, as the points that stand out on a diagram might not be too representative. It deserves a closer look, so keep reading 🙂

Cold starts: ARM64 vs. x86_64

I collected some more data through the night, and categorized it into two bins, ARM64 and x86_64. Then, I compared them, but for each value of memory separately. Here’s how it looks:

Time needed to process a request in a cold start scenario with 128 MB of memory. — Time needed to process a request in a cold start scenario with **128 MB** of memory.

Time needed to process a request in a cold start scenario with 512 MB of memory. — Time needed to process a request in a cold start scenario with **512 MB** of memory.

Time needed to process a request in a cold start scenario with 2048 MB of memory. — Time needed to process a request in a cold start scenario with **2048 MB** of memory.

The observed difference of median cold start times between ARM64 and x86_64, assuming both versions run with the exact same amount of memory. — The observed difference of median cold start times between ARM64 and x86_64. I compare functions that run with the same amount of memory. The more memory we add, the smaller is the difference between CPU variants.

Discussion

So, what story does the data tell? When the amount of memory is a constant, .NET 6 lambdas running on ARM64 machines will start slightly slower than those on x86_64 machines. That difference decreases as we run on more powerful machines.

The difference could be easily compensated by adding some more memory (and thus, CPU power) to ARM64 Lambdas. As our measurements confirm, cold start time of .NET Lambdas is very sensitive to changes in the memory parameter.

The performance of .NET on ARM gets a lot of attention nowadays. The newer .NET 7 already received performance improvements in this area. I haven’t tested it here because it’s not a Long-Term Support release. It is therefore not available as a managed runtime in AWS Lambda. But the results can be different, and I’m sure there will be improvements in the absolute values for both CPU architectures.

My impression is that the differences in cold start times, while they exist, don’t seem too significant. In .NET 6 it doesn’t look like an important factor to consider when choosing the right CPU architecture.

The source code used to perform those AWS Lambda cold start measurements on GitHub.
An interactive version of the above charts on Tableau Public.
A look at Lambda cold starts improvements in .NET7 (discussed only in the context of x64 and Native AOT).