Jul 20, 2016

Requests per second. A server load reference

As there seems to be a lot of misconceptions about what Big Data, there are also not really a good baseline to know “how much is high load”, specially from the point of view of people with not that much experience in servers. If you have some experience dealing with servers, you will probably know all this. So, just for the sake of convenience, I am going to do some back-of-the-envelope calculations to try to set a few numbers and explain how to calculate how many requests per second a server can hold.
We are going to use RPS (requests per second) as the metric. This measures the throughput, which is typically the most important measure. There are other parameters that can be interesting (latency) depending on the application, but in a typical application, throughput is the main metric.
Those requests can be pure HTTP requests (getting a URL from a web server), or can be other kind of server requests. Database queries, fetch the mail, bank transactions, etc. The principles are the same.

I/O bound or CPU bound

There are two type of requests, I/O bound and CPU bound.
Everything you do or learn will be imprinted on this disc. This will limit the number of requests you can keep open
Everything you do or learn will be imprinted on this disc. This will limit the number of requests you can keep open
Typically, requests are limited by I/O. That means that it fetches the info from a database, or reads a file, or gets the info from network. CPU is doing nothing most of the time. Due the wonders of the Operative System, you can create multiple workers that will keep doing requests while other workers wait. In this case, the server is limited by the amount or workers it has running. That means RAM memory. More memory, more workers.[1]
In memory bound systems, getting the number of RPS is making the following calculation:
RPS = (memory / worker memory)  * (1 / Task time)
For example:
Total RAMWorker memoryTask timeRPS
16Gb40Mb100ms4,000
16Gb40Mb50ms8,000
16Gb400Mb100ms400
16Gb400Mb50ms800
Crunch those requests!
Crunch those requests!
Some other requests, like image processing or doing calculations, are CPU bound. That means that the limiting factor in the amount of CPU power the machine has. Having a lot of workers does not help, as only one can work at the same time per core.  Two cores means two workers can run at the same time. The limit here is CPU power and number of cores. More cores, more workers.
In CPU bound systems, getting the number of RPS is making the following calculation:
RPS = Num. cores * (1 /Task time)
For example:
Num. coresTask timeRPS
410ms400
4100ms40
1610ms1,600
16100ms160
Of course, those are ideal numbers. Servers need time and memory to run other processes, not only workers.  And, of course, they can be errors. But there are good numbers to check and keep in mind.

Calculating the load of a system

If we don’t know the load a system is going to face, we’ll have to make an educated guess. The most important number is the sustained peak. That means the maximum number of requests that are going to arrive at any second during a sustained period of time. That’s the breaking point of the server.
That can depend a lot on the service, but typically services follow a pattern with ups and downs. During the night the load decreases, and during day it increases up to certain point, stays there, and then goes down again. Assuming that we don’t have any idea how the load is going to be, justassume that all the expected requests in a day are going to be done in 4 hours. Unless load is very very spiky, it’ll probably be a safe bet.
For example,1 million requests means 70 RPS. 100 million requests mean 7,000 RPS. A regular server can process a lot of requests during a whole day.
That’s assuming that the load can be calculated in number of requests. Other times is better to try to estimate the number of requests a user will generate, and then move from the number of users. E.g. A user will make 5 requests in a session. With 1 Million users in 4 hours, that means around 350 RPS at peak. If the same users make 50 requests per sessions, that’s 3,500 RPS at peak.
HULK CAN HOLD ANY LOAD!
HULK CAN HOLD ANY LOAD!

A typical load for a server

This two numbers should only be user per reference, but, in my experience, I found out that are numbers good to have on my head. This is just to get an idea, and everything should be measured. But just as rule of thumb.
1,000 RPS is not difficult to achieve on a normal server for a regular service.
2,000 RPS is a decent amount of load for a normal server for a regular service.
More than 2K either need big servers, lightweight services, not-obvious optimisations, etc (or it means you’re awesome!). Less than 1K seems low for a server doing typical work (this means a request that is simple and not doing a lot of work) these days.
Again, this are just my personal “measures”, that depends on a lot of factors, but are useful to keep in mind when checking if there’s a problem or the servers can be pushed a little more.
1 – Small detail, async systems work a little different than this, so they can be faster in a purely I/O bound system. That’s one of the reasons why new async frameworks seems to get traction. They are really good for I/O bound operations. And most of the operations these days are I/O bound.

0 comments:

Post a Comment

Nam Le © 2014 - Designed by Templateism.com, Distributed By Templatelib