This is an explanation of how I made an API endpoint respond to requests around 50 times faster. This is a story from a production system serving millions of users, some of whom want to receive email notifications.

Scenario

Let’s say that you have four records:

  • users
  • subscriptions
  • subscriber_lists
  • documents

A document is some JSON with many top-level attributes:

{
  "title": "Harry Potter",
  "author": "JK Rowling"
  ...
}

When a document is created or updated you need to notify users of that change.

Users specify the kind of documents they want to hear about by creating a subscription, which has a subscriber list.

Subscriptions:

id subscriber_list person
1 1 john
2 1 jane
3 2 jack

Subscriber lists:

id attributes
1 {“title”:[“Harry Potter”,”Twilight”]}
2 {“author”:[“JK Rowling”]}
3 {“title”:[“Harry Potter”,”Twilight”],”author”:[“JK Rowling”,”Mark Twain”]}

When a document is changed, you must find any matching subscriber lists to then notify subscribed users.

However, what we are interested in, is how users identify which subscriber list to subscribe to in the first place. Or if they need to create a new list to subscribe to.

If a user wants to be notified when a document with a title of Harry Potter or Twilight, they need to create a subscription to the subscriber_list with id 1.

We need to find a subscriber list which is an exact match to the attributes that the user wants to subscribe to, since we don’t want duplicated subscriber lists.

Input:

{"title":["Harry Potter","Twilight"]}

Output:

{ subscriber_list: 1 }

As another restriction, the database storing these records is postgres 9.3, which cannot be immediately upgraded, and the attributes field is a json type.

Existing solution

Take a look at the following ruby (rails) code:

def find_subscriber_list(query)
  return [] unless query.present?

  subscriber_lists = SubscriberList.where("ARRAY(SELECT json_object_keys(attributes)) = Array[:keys]", keys: query.keys)

  subscriber_lists.select do |subscriber_list|
    subscriber_list.attributes.all? do |key, attributes|
      query_values = query[key]
      list_values = attributes[key]
      query_values.sort == list_values.sort
    end
  end
end

This code would first use ActiveRecord to query postgres for all subscriber lists that had the same keys as were used in the query.

For example, if the user wanted to subscribe to authors or titles of a certain name, this code would find all subscriber_lists that had the attribute keys author and title.

Once the subscriber lists were loaded into application memory, the code would loop through the lists to find those that had matching values. Because order should not matter, both arrays are sorted before they are compared.

Something changes

There is little point optimising this code at this stage.

Currently there are a few hundred subscriber lists, all with non-matching keys, so the majority of the work is being done by postgres with very few operations in ruby.

The query works, responds to requests efficiently, and is forgotten about.

That is, until many new subscriber lists are created, all using a single key, with dozens of criteria:

{ "tag": ["a12", "c32", "b521", "b212", "d230", "z291", ...] }

Now most of the work will be done in ruby. The loop and sort operations become a bottleneck in the query.

There are a few things that can be improved about this code.

Firstly, the code returns all records that match. One optimisation would be to return early from the loop by replacing that select with a find.

Another optimisation would be to store the arrays already sorted, so that we would not need to sort the list arrays in the loop. We could also sort the values of the query just once, not for every subscriber list.

This is fine, but we’re still doing work in ruby, and we still need to bring pretty much the entire subscriber list table into memory for each query.

Things got slow. The query started to take up to 2 seconds. Moreover, the number of subscriber lists continued to grow, and the query got slower and slower.

Things could be easier

Improving the performance of this query would be easier in other circumstances.

Non-relational databases like Mongodb, and the jsonb data type in later postgres versions make it possible to do all of this query in the database, which is much faster.

For an example of the query performance difference of using the jsonb data type in place of json in a postgres db see the performance section on this interesting post.

For another thing, containment queries (what the ruby code is doing) become possible with jsonb and later versions of postgres.

However, as mentioned above, we can’t upgrade the db for around 6 months, and the query is getting slower by the day.

I could also know more about databases and query optimisation. I know very little, so that is also a drawback.

Making it 50x faster

OK here come the hacks.

The goal was to move all of the query work into postgres. The solution I came up with was to generate a digest of a subscriber lists attributes and then query on the digest instead.

I added a new column to the subscriber lists table, called attributes_digest.

This field contains a hash of the subscriber list’s attributes.

class AddAttributesDigestToSubscriberLists < ActiveRecord::Migration[5.2]
  def change
    add_column :subscriber_lists, :attributes_digest, :string
  end
end

I also added an index on the digest column. JSON fields in our postgres version can’t have indexes. So this was a win.

class AddIndexesToSubscriberListDigest < ActiveRecord::Migration[5.2]
  disable_ddl_transaction!

   def change
    add_index :subscriber_lists,
              :attributes_digest,
              algorithm: :concurrently
  end
end

I added a class to normalise a subscriber list’s attributes. This is important for exact matches.

Digest::SHA256.hexdigest(normalize_hash(hash))

I added a digest to all new and existing subscriber lists.

# in SubscriberList class
before_save do
  self.attributes_digest = HashDigest.new(attributes).generate
end

# migration
class AddDigestToExistingSubscriberLists < ActiveRecord::Migration[5.2]
  disable_ddl_transaction!

   def change
    SubscriberList.where("attributes_digest IS NULL").find_each do |list|
      list.attributes_digest = HashDigest.new(list.attributes).generate
      list.save!
    end
  end
end

Finally, I updated the query to use this new column:

def find_subscriber_list(query)
  digest = HashDigest.new(query).generate
  SubscriberList.find_by_attributes_digest(digest)
end

This seemingly simple change removes the work from ruby (aside from generating a hash for queries and subscriber list updates) and into postgres.

The index makes it considerably faster to look up an exact match.

Knowing it worked

I ran a benchmark to demonstrate the performance improvement.

99% of requests were completed with 3161ms for the old query, compared to 66ms with the new query, or almost 50x faster.

I created 25k subscriber lists locally, and fired requests at the old and new query using ab, a benchmarking tool.

ab -t 10 -r -c 5 /endpoint?tag[]=a12,b23,c34,d45...

Benchmark results

The original query performed poorly:

18 requests completed.

Percentage of the requests served within a certain time (ms)
50% 2518
66% 2549
75% 2744
80% 2873
90% 3138
95% 3161
98% 3161
99% 3161
100% 3161 (longest request)

The new query performed much better:

1167 requests completed.

Percentage of the requests served within a certain time (ms)
50% 41
66% 45
75% 47
80% 49
90% 53
95% 57
98% 63
99% 66
100% 598 (longest request)

I was able to reproduce this performance improvement in a staging environment, so it went to production. There was a cliff-edge drop in response time for that endpoint, which confirmed that the optimisation was a success.

Conclusion

Thanks to Kevin, the senior developer who reviewed my work and helped me with the deploy plan for this.

This may not be an orthodox way to fix this problem, but if you’re in a similar situation, I hope this helps you. I’d be keen to hear from others who have solved this kind of problem in other ways.