Reduce number of queries

October 28th, 2014

Customers often call me because their site is slow. One of the most common problems I found was a high number of queries that get executed for every single page hit. When I say a lot, I mean sometimes more than 1000 queries for a single page. This is often the case with a CMS which has been customized for the client's specific needs.

In this article, aimed at beginner to intermediate developers, I will explain how to figure out whether the number of queries might be a problem, how to count them, how to find spots to optimize and how to eliminate most of these queries. I will focus specifically on number of queries, otherwise I could write a whole tome. I'll provide code examples in PHP, but the advice applies to every language.

Start from the top

Start by looking at your browser's inspector. I will demonstrate using Chrome. Open the inspector and reload the page. Click the Network tab and the first result will be the document generated by your script.

Inspector

You'll notice that it takes 950 ms to load. Don't stop there, check the breakdown of that time by hovering over the timeline bar to eliminate any doubt.

inspector-2

According to this chart, most of the time is spent by the server generating the page. The two most likely reasons are: you either use too much CPU (processing) or too much disk (files or database). Now that we narrowed the problem down a bit, it's time to see whether we in fact have a lot of queries.

Count queries

If you're using a CMS or a framework, then you probably have access to a single function that sends all the queries or even logs them. For example, with Moodle (a learning system used by many universities) you can call $DB->perf_get_queries() at the bottom of the page to output the number of queries executed. Check the documentation of your product to find these functions.

If all the code is your own, then wrap all database calls in a single function and add a counter. Example:

protected $num_queries = 0;
public function query_db($sql, $params) {
  $this->num_queries++;
  // execute here
}
public function get_num_queries() {
  return $this->num_queries;
}

My rule of thumb is to focus on pages that execute 10 or more queries. You can also log the individual queries in an array so that you can have access to a full list. This list will be helpful helpful to find which ones are very similar and can be combined.

One common reason for so many queries is executing them in a loop. For example, I often see a query for a list of Categories, a loop on these categories and then another query to get the list of Subcategories. Example:

$categories = get_categories();
foreach ($categories as $category) {
  $subcategories = get_categories($category->id);
}

Just here, there is a potential for dozens of queries. Sometimes, I see nested loops, where people would iterate on the subcategories to get the list of Courses. Now you're probably up to 100 queries. Throw in a query to get some additional information for each Course, such as whether the user has completed them and you might be over 1000 queries.

Optimize

Once you get a log of all the queries that were executed, look for ones that look almost the same. Example:

select * from course_categories where parent = 0;
select * from course_categories where parent = 1;
select * from course_categories where parent = 3;
select * from course_categories where parent = 15;
select * from course_categories where parent = 22;

I can tell right away that the first query gets all top-level categories and we then iterate on them to get subcategories for each. The best way to fix this is to combine. You do this by executing a hand-crafted query in the code instead of relying on the built-in functions:

SELECT
cat.id AS cat_id, cat.parent, cat.name AS cat_name,
subcat.id AS subcat_id, subcat.parent, subcat.name AS subcat_name
FROM course_categories AS cat
LEFT JOIN course_categories AS subcat ON subcat.parent = cat.id
WHERE cat.parent = 0
ORDER BY subcat.parent;

This will fetch both the categories and their subcategories. It might look a bit complex, but I'm really just using table and column aliases. As a bonus, we only fetch the columns that we need instead of * (star), since getting data that we don't need is just slowing things down. The results you get back will look like this:

+--------+--------------------------+-----------+------------------------------------+
| cat_id | cat_name                 | subcat_id | subcat_name                        |
+--------+--------------------------+-----------+------------------------------------+
|      1 | Category 1               |         2 | Subcategory 1-1                    |
|      3 | Category 2               |         4 | Subcategory 2-1                    |
|      3 | Category 2               |         5 | Subcategory 2-2                    |
+--------+--------------------------+-----------+------------------------------------+

In case you are confused about how to read this, you're really getting one record for every Subcategory, with the Category data embedded into it in the first half. Now you just need to iterate over your results and each time the cat_id is not the same as the previous one, we know that we are at the next top-level category. Here's an abbreviated code in case you're unsure about how to do that:

$results = $DB->execute($sql);
$prev_id = null;
foreach ($results as $subcategory) {
  if  ($prev_id == null || $prev_id != $subcategory['cat_id']) {
    // output category info here
  }
  // output subcategory info here
  $prev_id = $subcategory['cat_id'];
}

If you're lucky enough to use Doctrine or an equally good ORM, it will create a nested resultset for you that will look like this, which is much simpler to process in your code:

array(
  array(
    id => 1,
    name => Category 1,
    subcategories => array(
      array(
          id => 4,
          name => Subcategory 1-1,
      )
    )
  ),
  array(
      id => 3,
      name => Category 2
      subcategories => ...
  )
)

Conclusion

Joining tables helps you dramatically reduce the number of queries. If you only fetch the columns that you need, your queries will also execute faster. It is one of the most common ways in which I optimize code for my clients. On a recent project, I was able to reduce one page from ~1400 queries to 8 just by using joins. And the new queries I wrote weren't any slower than the original ones. You do the math.

You MUST check how many queries you execute. It's such a low-hanging fruit!

Previous: Should I write a code of conduct? Next: RE: How to Submit a Talk to a Conference