Execute CTE queries using Doctrine ORM

There are a lot of developers who manipulate databases using an ORM and don’t know more SQL than the “classic” SELECT ... FROM ... WHERE ... queries. But databases have various unknown features that can avoid programing data processing. One of them is CTE (for Common Table Expressions).

This article describes how to use this feature with PHP Doctrine ORM.

First, let me explain what a CTE is. A Common Table Expression (CTE) is a named temporary result set that you define at the beginning of a SQL query using the WITH clause. It exists only for the duration of that query and can be referenced like a table within it.

To illustrate how it works, imagine a blog where articles can be stored and attached to one category. Categories are organized in a tree structure. Now, we want to retrieve a specific category with all its parents. The naïve PHP implementation can be something like:

// src/Repository/CategoryRepository.php
class CategoryRepository extends ServiceEntityRepository
{
    // ...

    /**
     * @return list<Category>
     */
    function getCatagoriesWithParents(int $categoryId): array
    {
        $category = $this->categoryRepository->find($categoryId);

        $categories = [
            $category,
        ];

        while ($parent = $category->getParent()) {
            $categories[] = $parent;
        }

        return $categories;
    }
}

With the previous version, Doctrine will execute one query per loop: it’s the common N+1 problem. To resolve this issue through an SQL query, we can’t use a basic query. We need a recursive CTE. This SQL query can look like:

WITH RECURSIVE cte_category AS (
    -- starting point
    SELECT id, label, parent_id, 0 AS depth
    FROM category
    WHERE id = :id

    UNION ALL

    -- recursivity
    SELECT c.id, c.label, c.parent_id, cp.depth + 1
    FROM category c
    INNER JOIN cte_category cp ON c.id = cp.parent_id
)
SELECT id, label, parent_id FROM cte_category
ORDER BY depth DESC

First, the previous query retrieves the :id category. Then, recursively, it retrieves all the parents. The depth column is used to order the final result.

The problem with this kind of SQL query is that Doctrine is not able to execute them natively. We can’t use the QueryBuilder or a DQL (Doctrine Query Language) query. We need to execute a native query and map the result on an object.

// src/Repository/CategoryRepository.php
class CategoryRepository extends ServiceEntityRepository
{
    // ...

    public function findAllWithParents(int $categoryId)
    {
        $rsm = new ResultSetMappingBuilder($this->getEntityManager());
        $rsm->addRootEntityFromClassMetadata(Category::class, 'c');

        $sql = <<<SQL
            WITH RECURSIVE with_categories AS (
                SELECT id, label, parent_id, 0 AS depth
                FROM category
                WHERE id = :id

                UNION ALL

                SELECT c.id, c.label, c.parent_id, cp.depth + 1
                FROM category c
                INNER JOIN with_categories cp ON c.id = cp.parent_id
            )
            SELECT id, label, parent_id FROM with_categories
            ORDER BY depth DESC
        SQL;

        return $this->getEntityManager()
            ->createNativeQuery($sql, $rsm)
            ->setParameter('id', $categoryId)
            ->getResult();
    }
}

In the previous code, the ResultSetMappingBuilder maps the query results to the Category entity object properties. The output will be a list of Category ordered by ancestors.

This kind of query allows a lot of things. For example, we can retrieve every post of a specific category, including the category parents:

WITH RECURSIVE category_path AS (
    SELECT id, label, parent_id
    FROM category
    WHERE id = :id

    UNION ALL

    SELECT c.id, c.label, c.parent_id
    FROM category c
    INNER JOIN category_path cp ON c.id = cp.parent_id
)
SELECT a.id, a.title, a.category_id
FROM article a
INNER JOIN category_path cp ON a.category_id = cp.id
ORDER BY a.published_date DESC

With only one query, every post is retrieved. Without a recursive CTE, multiple successive queries will be necessary. The drawback is it depends on the database engine and could not be available in some databases.

Moreover, using the ResultSetMappingBuilder requires all columns needed to hydrate the entity should be included in the SELECT. Otherwise, the entity will be partially hydrated without any explicit notice or error.

Jérémy DECOOL

Jérémy DECOOL

As a software architect, I share my thoughts on best practices in software development, software architecture and team organization for maintainable and scalable projects.