I’ve been working on search applications at GDS. These ruby applications use Unicorn as a web server, rather than Puma which is the default server for rails.

Recently finder-frontend, the rails application responsible for serving search pages like gov.uk/search, has been receiving more traffic, as we’ve migrated some high traffic search pages over to this application.

After reading Tommaso Pavese’s article Unicorn vs Puma: Rails server benchmarks, I wanted to find out whether using Puma instead of Unicorn would have a noticeable affect on performance, as it did at Deliveroo.

The tl;dr is that Puma seems to be much more performant.

For details of the methods of testing Puma vs Unicorn, I recommend reading the more comprehensive article that Pavese wrote, and that this test is based on.

Running the test

I added puma to finder-frontend and tested both req/s and response times for the HTML and JSON responses for two search UIs (four endpoints):

  • /search/all
  • /search/all.json
  • /search/news-and-communications
  • /search/news-and-communications.json

I ran the application using puma (version 3.12.1), and then unicorn (version 5.4.1), both using 4 worker processes. The puma workers had 5 threads.

The worker and thread counts were chosen pretty arbitrarily based on a couple of blog posts I read. For a proper test I suppose I’d need to run the benchmark with various number of threads/workers.

I used a variation of Pavese’s benchmarking script (which uses ab, Apache’s HTTP server benchmarking tool) to test the endpoints.

The four endpoints perform a mixture of IO and CPU intensive operations before responding. For each request, finder-frontend makes calls to external services content-store and search-api and performs some operations to make the requests and render the data as an HTML or JSON response.

I pointed finder-frontend at live resources, so the assets (e.g. GOV.UK logo), and various data required to create the response, were all requested from production services.


As the number of concurrent requests increases, Unicorn’s response time grows at a much faster rate when compared to Puma.

Below is the output from testing finder-frontend with 1, 10, 20, 30, 40, and 50 concurrent requests, and the corresponding impact this had on the response time and the number of requests per second that the servers could respond to.

Average response time (ms) for search/all.json

It’s much more significant when rendering a large and complex HTML response, rather than JSON:

Average response time (ms) for search/all

Avg. response time unicorn (1) puma (1) unicorn (10) puma (10) unicorn (20) puma (20) unicorn (30) puma (30)
/search 984.299 994.443 5033.899 3575.62 11563.309 7480.481 20115.627 12075.395
/search.json 1164.067 500.484 1583.824 1022.505 3260.364 1938.761 5137.573 2835.526
/news 1223.233 972.684 4550.558 3148.616 10468.458 7014.036 18017.334 11789.548
/news.json 814.107 320.842 1099.361 557.855 2162.194 1008.966 3365.21 1585.42

I found Puma is able to handle more requests per second than Unicorn. I guess this is because Puma is just faster to respond to requests due to being multithreaded and therefore can handle more concurrent requests. While one thread is waiting for a search-api response, another thread can do some work.

req/s for /search/all req/s for /search/all

Request per second

req/s unicorn (1) puma (1) unicorn (10) puma (10) unicorn (20) puma (20) unicorn (30) puma (30)
/search 1.02 1.01 1.99 2.80 1.73 2.67 1.49 2.48
/news 0.82 1.03 2.20 3.18 1.91 2.85 1.67 2.54
/search.json 0.86 2.00 6.31 9.78 6.13 10.32 5.84 10.58
/news.json 1.23 3.12 9.10 17.93 9.25 19.82 8.91 18.92


This was a quick test, so there isn’t much to analyse, other than to note that I’ve been able to reproduce Pavese’s results when running this against a live application:

Puma performs better than Unicorn in all tests that were either heavily IO-bound or that interleaved IO and CPU work. cite

finder-frontend certainly fits this description, as it makes calls to a few different services while it’s running and when responding to requests, while also doing a bunch of heavy work to create search queries and render the results. I’d refer you to Pavese’s article to read more about why this might be.


There’s a lot of value in consistency of applications in a service-oriented architecture. Migrating all our rack applications in the system to use Puma (if that would be a good thing) would take time, and the applications would be less consistent during this. So, for this short-term trade off I’d need more proof that there’s substantial value in migrating.

I’m also interested in the memory usage implications of switching to Puma. While Deliveroo’s blog post suggests that Puma has a higher memory footprint, Finc has memory constraints as being the main reason they switched to Puma.

I’m going to do some more testing to see whether migrating to use Puma would have the impact I think it would. Testing with tens of thousands of concurrent connections on a live environment with many more threads, workers, and instances of the application, will likely lead to a different result than testing on my local machine.

Nevertheless I found this an interesting experiment, and I have an idea why Puma is the default rails server.

If you’ve noticed any issues with this article please let me know, I’ll be happy to correct this.