What does Scaling mean and how do load balancers help?

8 min readJan 5, 2022

Firstly, lets understand the term scalability with the help of a real life example :

Lets say, you have opened a new restaurant which looks like this :

Now, as your customers are growing, you are feeling the lack of space and you need to do something to accommodate other customers as well, for that you will try adding more chairs to the tables.

It will slightly increase the space inside the restaurant, but as the need grows more, you will find difficulty in increasing more chairs and you need to think of any other way to accommodate more persons, i.e by increasing the number of tables.

This will surely help in making space for more customers.

Similarly, If you’re a developer then building a website and serving it to the users can be the most pleasurable things for you. When you see a huge number of users have started using your services and day after day the numbers are growing you really feel amazing and you start thinking about taking your services to the next level. You feel good to see the growing numbers but sooner you realize your machine or server can not handle large amounts of requests anymore and it may stop responding anytime. Now you need to find a solution to scale your application for the enormous number of requests. You will have to expand your app’s accessibility, power, and presence. You can solve this problem by adding extra hardware or by upgrading the current system configuration which is called scalability. In simple terms, scalability means increasing the capacity of your service so as to meet more requirements, like in the above example we increased chairs and tables to meet more requirements.

Now, let's understand the two techniques of scaling: vertical scaling and horizontal scaling.

1. Vertical Scaling:

In simple terms increasing the capacity of a single machine or moving to a new machine with more power is called vertical scaling. You can add more power to your machine by adding better processors, increasing RAM, or other power-increasing adjustments.

2. Horizontal Scaling:

Horizontal scaling increases the scale of a system by adding more machines. This entails collecting and connecting multiple machines to take on more system requests.

In the above example of a restaurant, when we were increasing the chairs to increase the capacity of the table, we were performing vertical scaling, but when we realized that this was not enough, then we increased our resource i.e. the tables itself to meet the requirements.

Now, as we have understood the term scaling and its types, now lets move on to the real-life scenarios where the need for scaling is seen:

Consider a scenario where an application is running on a single server and the client connects to that server directly, it will look something like below…

We need to discuss the two main problems with this model…

Single Point of Failure: If the server goes down or something happens to the server the whole application will be interrupted and it will unavailable for the users for a certain period. It will create a bad experience for users.
Overloaded Servers: There will be a limitation for the number of requests which a web server can handle. If the business grows and the number of requests increases, the server will be overloaded. To solve the increasing number of requests we need to add a few more servers and we need to distribute the requests to the cluster of servers.

As we can see when a website becomes extremely popular, the traffic on that website increases and the load on a single server also increases. The concurrent traffic overwhelms the single server and the website becomes slower for the users.

To meet the request of these high volumes of data and to return the correct response in a fast and reliable manner we need to scale the server. This can be done by adding more servers (horizontal scaling) to the network and distributing all the requests across these servers. But….who is going to decide which request should be routed to which server…???
The answer is…Load Balancer :)

The load balancer is a server that usually sits between client devices and a set of servers and distributes client requests across servers. Load balancers can be placed in various places of a system. The loads on the servers need to be distributed in a balanced way; that’s why they are called load balancers.

So, we may say that load balancing is the process of distributing client requests across multiple servers. If a server is not available to deal with new requests or is not responding, LB will stop sending requests to such a server. In system design, horizontal scaling is a common strategy to scale our system in case of a large number of users. A load balancer is a solution for horizontal scaling. By balancing the incoming traffic of a system, LB prevents a server from becoming overloaded. It also ensures better overall throughput of the system.

Where Are Load Balancers Typically Placed?

Below is the image where a load balancer can be placed…

In between the client application/user and the server
In between the server and the application/job servers
In between the application servers and the cache servers
In between the cache servers the database servers

Now, as we have understood that load balancer helps in distributing the load among various servers thus increasing the performance of the system making it more scalable and avoiding the single point of failure, but now the question is that does the load balancer know, which request is to be sent to which server. There are two main factors that a load balancer considers before forwarding a client request to a server:

First, load balancers need to ensure that the chosen server is responsive; meaning that it is responding to its requests.
Secondly, the LBs use a pre-configured algorithm to select one from the set of responsive healthy servers.

What does a healthy server mean?

The servers which successfully perform the health checks performed by the load balancers are the healthy servers.

Health check: Load balancers need to forward traffic to healthy or responsive servers. To monitor the health, LBs constantly try to connect to backend servers to ensure that servers are listening. If a server fails to pingback in case of a health check, it is removed from the pool, and requests will not be forwarded to it until it is responsive again.

Now, let us understand the algorithms that load balancers use to forward the request to multiple servers, these techniques are actually different types of strategies for server selection. Here is a list of load balancing techniques:

Random selection: In this method, the servers are selected randomly. There are no other factors calculated in the selection of the server. There might be a problem with some of the servers sitting idle, and some are overloaded with requests in this technique.
Round Robin: This is one of the most common load balancing methods. It’s a method where the LB redirects incoming traffic between a set of servers in a certain order. The first request goes to server 1, the second one goes to server 2, and so on. When LB reaches the end of the list, it starts over at the beginning, from server number 1 again. It almost evenly balances the traffic between the servers. But in this method, server specifications are not considered. The servers need to be of equal specification for this method to be useful. Otherwise, a low processing powered server may have the same load as a high processing capacity server.

Weighted Round Robin Method: This is the updated version of the previously described round-robin method. This method is designed to handle servers with different characteristics, which was a problem in the normal round-robin. A weight is assigned to each server. This weight can be an integer value that varies according to the processing power of the server. Depending on the weighted score the request is distributed to these servers. So in this method, some of the servers get a bigger share of the overall request.

Least Connection: Here, the load balancer sends traffic to the server with the fewest active connections at the time when the client request is received. To do this load balancer needs to do some additional computing to identify the server with the least number of connections. This may be a little bit costlier compared to the round-robin method but the evaluation is based on the current load on the server.

Least Response Time: This algorithm sends the client requests to the server with the least active connections and the lowest response time. The backend server that responds the fastest receives the next request.
Source IP Hashing: In this method, a hash of the client’s IP address is generated which is used to select a server for a client. Even if the connection is broken, the client’s next request will still go to the same server. So, this method can be used in a situation where clients need to be connected to a session that is still active after its disconnection. It can maximize cache hits and improve performance.

With this we come to the end of the blog, hope you have gained some knowledge about scaling and load balancing from this blog.

Linkedin: https://www.linkedin.com/in/kanchan-jeswani-888827173/

Github: https://github.com/kanchan1910

Thank You !!!

What does Scaling mean and how do load balancers help?

1. Vertical Scaling:

2. Horizontal Scaling:

Where Are Load Balancers Typically Placed?

Signing off…

Written by Kanchan Jeswani