Performance degradation during scale-out operations #45

felipmiguel · 2023-04-29T22:57:27Z

Describe the bug
During scale out operations the performace of the application is degraded. It happens with auto-scale and also manual scale. It doesn't happen during scale-in operations.

To Reproduce
Steps to reproduce the behavior:

Given an application with constant workload.
In Azure portal go to apps. Select the application to be scaled-out.
Go to scale out settings.
Increase the number of instances of the application and save.
During the scale-out operation the application throughput is reduced significantly. For instance going from 500K Requests per Second (RPS) to less than 100K RPS. This can be measured using Application Insights.
When the scale-out operation completes the throughput recovers and work as expected, increasing accordingly to new number of instances.

Expected behavior
The expected behavior is that the performance of the application stays stable during the scale-out operation. Once the operation is complete, then throughput should improve.
That is specially important for systems under heavy load. If there is auto-scale in place usally is triggered when more performance is needed, but during this scale-out operation process the performance of the solution goes down.

Screenshots

Additional context
To generate the load I was using Azure Load Testing and the load was constant, that is why it is possible to detect this situation.

Can we contact you for additional details? Y/N

If yes, please send us your contact information to [email protected] and include the issue number in the email title.

qingc · 2023-05-23T05:15:59Z

Hi @felipmiguel,
Thank you for submitting this issue.
We will try to reproduce this issue and figure out how to fix it.

qingc · 2023-06-05T13:29:02Z

Hi @felipmiguel,

I have reproduced this issue. The RPS drops during application scale-out, see metrics from Application Insights.

From dashboard on test client(Azure Load Test) side, response time increased during app scale-out.

It can also be observed from server side, see metrics from Application Insights

From Application Insights live metrics, we can see part of requests are handled by new Pods which have longer response time during new Pods warm-up. The application is using in-memory cache, the response will be longer in newly created Pods than existing Pods because the cache is not available yet.

According to RPS calculation formula: Virtual users = RPS * latency in seconds. Given the number of virtual users is consistent, as latency increases the RPS drops. This explains why the RPS drops during application scale-out, it is expected behavior. See more details here: Key concepts for Azure Load Testing | Microsoft Learn

allxiao added the Tracking label May 17, 2023

zhiszhan assigned qingc May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degradation during scale-out operations #45

Performance degradation during scale-out operations #45

felipmiguel commented Apr 29, 2023

qingc commented May 23, 2023

qingc commented Jun 5, 2023 •

edited

Loading

Performance degradation during scale-out operations #45

Performance degradation during scale-out operations #45

Kommentare

felipmiguel commented Apr 29, 2023

qingc commented May 23, 2023

qingc commented Jun 5, 2023 • edited Loading

qingc commented Jun 5, 2023 •

edited

Loading