What is the N+1 problem?

In software engineering, and more specifically in database management and ORM (Object-Relational Mapping) frameworks, the N+1 query problem is a common performance issue generated by non-optimized query design. This problem can significantly degrade the performance of applications, leading to increased load times and increased resources used by the database.

The N+1 query problem occurs when an application needs to retrieve related data from a database: instead of fetching all the required data in a single query, the application runs one initial query to retrieve a list of items (the “1” query), and then runs additional queries for each item to retrieve the related data (the “N” queries), generating a total of N+1 queries.

For example, let’s say that in our hypothetical e-commerce application we want to create a product listing page to show all the shoes that we have in the catalog.

As the first thing we will perform a query to fetch and filter from the database all the products:

SELECT * 
FROM product 
WHERE category = 'shoes'

Then we will run one additional query for each product to retrieve its main picture:

SELECT * 
FROM product_picture 
WHERE product_id = 123 AND is_main = 1

In this scenario we will have 1 query for the listing and N queries (one for each product) for the pictures.

If we have 3 products, we will perform 4 queries. If we have 100 products, we will perform 101 queries. Not good at all 🤔

n+1-example.png

N+1 in ORM

Fall into this problem it’s even easier when using ORM frameworks like ActiveRecord (Ruby on Rails), Hibernate (Java), Laravel (PHP), Django (Python) and so on.

ORM frameworks abstract the details of database interactions, making it easy for developers to perform CRUD operations without writing a single SQL query. While this abstraction simplifies development, it can also obscure the performance implications of certain operations, especially when it comes to relationships between entities.

Many ORM frameworks default to lazy loading for related entities (lazy loading means that related data is only fetched from the database when it is first accessed). This can lead to the n+1 problem if a collection or related entity is accessed within a loop, causing additional queries for each related entity.

For example, using Laravel Framework with his Eloquent ORM system:

// Initial listing query
$products = Product::where('category', 'shoes')->get();
foreach ($products as $product) {
    // One query for each product
    $picture = $product.mainPicture();
}

Solutions

Eager loading

Eager loading means to retrieve all the necessary data at the same time using a single query instead of one for each item.

This strategy in plain SQL can be implemented using JOIN operations, while in the ORM world usually specific functions and techniques are provided to address this issue.

Following our previous e-commerce example, in plain SQL:

SELECT *
FROM product
JOIN product_picture ON <join-clause>
WHERE product.category = 'shoes' AND product_picture.is_main = 1

Same using Laravel ORM:

// the with() clause tells the ORM to eager load the specified relationships
$products = Product::where('category', 'shoes')->with('picture')->get();

Batching or subqueries

Batching queries involves retrieving the related data in fewer operations (for example) by batching IDs and fetching data in chunks instead of one by one.

# One query for the listing
SELECT * 
FROM product 
WHERE category = 'shoes'

# One query for ALL the product details
SELECT * 
FROM product_picture
WHERE product_id IN (1, 2, 3, ..., 10)

In some use cases the same result can be also achieved using subqueries instead of separated queries, reducing the number of requests and taking advantage of the database query optimizer to increase performance.

Caching

If none of the previous solutions is feasible, then implementing caching strategies can help alleviate the load on the database by storing frequently accessed data in memory. This can be achieved using tools like Redis, Memcached or ORM-level caching features.