What is autoscaling ?
Vertical scaling means increasing/decreasing the compute size (i.e. CPU or memory). Generally it means changing the pricing tier.
Horizontal scaling means adding/removing the number of instances, each instance having same compute power. This also may increase the cost, but pricing tier is same.
autoscaling mostly means horizontal scaling. It means adding or removing instances based on certain preset conditions
Azure Monitor collects metrics and logs from almost all type of resources. This data can be used to configure
autoscaling , either demand based or schedule based.
- Demand based autoscaling meaning increase or decrease the instances depending on current load.
- Schedule based autoscaling means increase or decrease the instances based on some schedule
In this article, we will try to configure demand based autoscaling for Azure App Service.
Note that you also have option to manually scale out/in the service, but again this would still be inefficient as someone from operations team sill need to continuously keep an eye on when scale out or scale in is required.
Create Azure App Service
Follow steps given in this article to create an app service and publish a web application to that app service instance.
Configure Auto Scaling Rule
You may observe that while creating app service instance, it also (optionally) creates an app service plan. This app service plan defines the quantity of server resources (e.g. how much memory and how many cores of CPU, etc.).
From the overview blade, click on app service plan and it will open app service plan details panel. Then select Scale out (App service plan) from the left navigation. By default, manual scale option is selected.
Select Custom autoscale as shown above and then scroll down. Next, select
Scale based on metric radio button. This means we intend to scale based on metrics collected by Azure Monitor (e.g. memory usage, cpu usage, etc.)
Now, click on Add a rule which will open Scale rule panel on right hand side. On this panel, provide below inputs:
- Time aggregation, the aggregation method for the metric
- Metric namespace, let it be default.
- Metric name, select CPU percentage
- Dimension, operator and values, let them default.
- Operator, the comparison operator for comparing CPU Percentage
- Metric threshold to trigger scale action, the value of metric CPU percentage which should trigger this rule. For this demo, select this to be 0.5 so that we can easily check if scaling rule works.
- Duration in minutes, the amount of time scale engine will look back
- Time Grain and Time grain statistics, let this be default to one minute
The next section Action describe what action needs to be taken
- Operation, either increase or decrease count by some number. Let’s select increase count operation.
- Instance count, the number by which instance count should be increased or decreased. Let this be 1.
- Cool down in minutes, the number of minutes for which scale engine should wait before triggering this action again. Let this be 5 minutes.
Once all these inputs are entered, click on Add button to save the rule.
Minimum, Maximum and Default
In addition to configuring scale rule, there are also settings to configure Minimum, maximum and default number of instances.
If there is any problem while scale engine tries to read the scale rule and at that time, the number of instances are less than default instances, then scale engine will scale out to the default number of instances.
When a scale rule which describe how to scale out is applied, the number of available instances increase by some count. Increasing instances also increase the Azure bills. Hence it is always best practice to configure maximum instances.
Minimum instances is number of instances that you want to keep running when load is “normal”.
It is best practice to thoughtfully configure these three numbers based on needs of your applications and your findings from load testing.
Configuring the scale rule will set these three values for us, by default minimum to 1, maximum to 3 and default to 2.
Verify the rule
Firstly, I did enabled manual scaling and set the number of instances to 1. Then I restarted the app service and the number of instances were still one.
Now if I accessed the application for 15 minutes, you should see the scale out operations and number of instances should increase. In my case, they were in case, the number of instances were increased to maximum, 3 as shown in below snapshot.
Below snapshot also shows scale-in rule and scaling in of instances count back to 1. Note that we have not covered that in above steps, but you can easily configure new scale rule with action set to decreasing count by some number.
Before we end
Note that we have configured very low CPU percentage to trigger scaling out. This is only for demonstration purpose. In real world, you may want to configure this to sufficiently higher value so as to ensure cost efficiency. You should also consider results from your load test to see at what CPU percentage your application’s response time starts increasing, causing poor user experience. And based on those results, the metric value should be configured.
Also, we just have added one rule, to scale out if CPU percentage is higher than some value. In real world, you may also want to add the Scale-in rule which will tell scale engine, on how to reduce the number of instances. The steps for configuring rules are same only Action -> Operation should be selected to
decrease the instance count.
If you do not configure scale-in rule, then Azure will not automatically scale-in, meaning your application might always run on maximum number of instances even though those many instances are not really required.
Also, we have added this current
scale rule in default scale condition. There can be more than one
scale conditions and one of them needs to be default. The
Default scale condition is executed when all other conditions are not met.
I hope you liked this article. Let me know your thoughts.