[SOLVED] Increase worker amount in cluster does not increase total performance anymore


I’m experimenting Node.js cluster features along with PM2, here is my little script that for testing:

// server.js
import { createServer } from 'http'

const { pid } = process
const server = createServer((req, res) => {
  res.end(`Hello from ${pid}`)

server.listen('8080', () => console.log(`Started at ${pid}`))

and I use wrk for load test:

wrk -c 200 -d 10 -t 4 http://localhost:8080

I use PM2 to start server.js and wrk to perform load test, adding one instance of the running service for each test. Here are the results:

server.js with 1 instance   --> Requests/sec:  46139.15
server.js with 2 instances  --> Requests/sec:  89343.35
server.js with 3 instances  --> Requests/sec:  124294.58
server.js with 4 instances  --> Requests/sec:  137826.08
server.js with 5 instances  --> Requests/sec:  134193.62
server.js with 12 instances --> Requests/sec:  123073.60

(All the actions is performed in my local machine, which is a iMac with Intel i9-9900K (16 logic core) @ 3.60GHz CPU)

As you can see, starting from 4 instances, the performance gains is becoming smaller. And starting from experiments with 5 instances, the performance decreases instead(I can confirm that the network load is not maxed out at this point, which is just 65MB/s, compared to the network card’s capacity with 1GB/s).

Another strange behavior is that as the number of instances increases, the cpu usage also increases when testing, but the performance is as described above.

So my question is: Why does the performance drop when the instances reach 5? It seems that adding instances does not increase performance any more.


Since your server doesn’t really do much that is CPU intensive, you may not be CPU-bound at all (and your low CPU usage implies that too). You may be network bound or some other bottleneck that occurs long before you get to a CPU limit. Thus when you add CPUs beyond some level, you don’t see benefit.

If you add a 100ms spin loop to your request handler to create actual CPU load, you will probably see very different results and adding more CPUs should show more benefit. Keep in mind that more CPUs only helps when you’re actually CPU-bound.

Also, keep in mind that with your processor where you have 8 real cores that are hyperthreaded to appear to have 16 cores, those extra 8 are really just "more efficient" threads. If you aren’t doing a lot of thread context switching, then those extra 8 virtual CPUs may not provide any real benefit.

And, the most accurate testing of your server configuration will be when the client load is on a different host and you are actually using the network to send/receive requests. localhost requests don’t actually go through the network.

Answered By – jfriend00

Answer Checked By – Jay B. (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *