Reduce number of queries

Customers often call me because their site is slow. One of the most common problems I found was a high number of queries that get executed for every single page hit. When I say a lot, I mean sometimes more than 1000 queries for a single page. This is often the case with a CMS which has been customized for the client’s specific needs.

In this article, aimed at beginner to intermediate developers, I will explain how to figure out whether the number of queries might be a problem, how to count them, how to find spots to optimize and how to eliminate most of these queries. I will focus specifically on number of queries, otherwise I could write a whole tome. I’ll provide code examples in PHP, but the advice applies to every language.

Start from the top

Start by looking at your browser’s inspector. I will demonstrate using Chrome. Open the inspector and reload the page. Click the Network tab and the first result will be the document generated by your script.

Inspector

You’ll notice that it takes 950 ms to load. Don’t stop there, check the breakdown of that time by hovering over the timeline bar to eliminate any doubt.

inspector-2

According to this chart, most of the time is spent by the server generating the page. The two most likely reasons are: you either use too much CPU (processing) or too much disk (files or database). Now that we narrowed the problem down a bit, it’s time to see whether we in fact have a lot of queries.

Count queries

If you’re using a CMS or a framework, then you probably have access to a single function that sends all the queries or even logs them. For example, with Moodle (a learning system used by many universities) you can call $DB->perf_get_queries() at the bottom of the page to output the number of queries executed. Check the documentation of your product to find these functions.

If all the code is your own, then wrap all database calls in a single function and add a counter. Example:

My rule of thumb is to focus on pages that execute 10 or more queries. You can also log the individual queries in an array so that you can have access to a full list. This list will be helpful helpful to find which ones are very similar and can be combined.

One common reason for so many queries is executing them in a loop. For example, I often see a query for a list of Categories, a loop on these categories and then another query to get the list of Subcategories. Example:

Just here, there is a potential for dozens of queries. Sometimes, I see nested loops, where people would iterate on the subcategories to get the list of Courses. Now you’re probably up to 100 queries. Throw in a query to get some additional information for each Course, such as whether the user has completed them and you might be over 1000 queries.

Optimize

Once you get a log of all the queries that were executed, look for ones that look almost the same. Example:

I can tell right away that the first query gets all top-level categories and we then iterate on them to get subcategories for each. The best way to fix this is to combine. You do this by executing a hand-crafted query in the code instead of relying on the built-in functions:

This will fetch both the categories and their subcategories. It might look a bit complex, but I’m really just using table and column aliases. As a bonus, we only fetch the columns that we need instead of * (star), since getting data that we don’t need is just slowing things down. The results you get back will look like this:

In case you are confused about how to read this, you’re really getting one record for every Subcategory, with the Category data embedded into it in the first half. Now you just need to iterate over your results and each time the cat_id is not the same as the previous one, we know that we are at the next top-level category. Here’s an abbreviated code in case you’re unsure about how to do that:

If you’re lucky enough to use Doctrine or an equally good ORM, it will create a nested resultset for you that will look like this, which is much simpler to process in your code:

Conclusion

Joining tables helps you dramatically reduce the number of queries. If you only fetch the columns that you need, your queries will also execute faster. It is one of the most common ways in which I optimize code for my clients. On a recent project, I was able to reduce one page from ~1400 queries to 8 just by using joins. And the new queries I wrote weren’t any slower than the original ones. You do the math.

You MUST check how many queries you execute. It’s such a low-hanging fruit!

2 thoughts on “Reduce number of queries

  1. “Joining tables helps you dramatically reduce the number of queries.”

    The reduction in queries has little to do with the join. In fact the join itself is not really a good idea.

    The problem is that the ORM ran separate queries. That’s all.
    The “correct” solution would be to run two queries. one to fetch all main categories and one to fetch all subcategories. Then the application can simply do a nested loop to process them.
    So, the join only reduced the number of queries by one.

    But the join did something much more important; it greatly increased the size of the resultset.
    Instead of fetching each main category once, it now fetches it as many times as it has subcategories. Simply put; if you have 10 main categories with 500 bytes of data for each row, and you have a total of 300 subcategories, you are fetching 300×500=150kB, where 5kB would suffice.
    Of course, who cares about 150kB, but what if the subcategories take 200bytes per row, and you have 40 items worth 100bytes in each subcategory, then it’s 500b+200b+100b * 10*300*40 = 96MB. in 120.000 rows. Neither your database nor your application are going to process that in a hurry.

    If you want to make nested data more efficient, you should use a database that can handle nested data.

    • You are correct that in some situations, it makes more sense to group queries without necessarily do a join, specifically because there is a lot of data repetition in the results. Thanks for pointing this out.

      In my experience, I dealt with more cases where the actual resultset returned too few columns to bother splitting into 2 queries. Also, it happens often that we need to go deeper than 2 levels (Categories + Subcategories + Courses), in which case, the number of queries will keep increasing exponentially for every level. Some joins can also serve the purpose of getting related data that has been split across many tables for normalization purposes (Courses + Descriptions) but is actually on the same level. In these situations, joins cause no unnecessary overhead. It really depends on the specific situation, so I covered what I consider the most common cases.

      For developers who are willing to push further, I strongly recommend benchmarking: timing queries while taking into account number of queries, their time to execute and network latency.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">