Data Loaders

Due to the flexibility of GraphQL, when loading related objects of a certain object, we often need to execute multiple queries. This leads to the notorious N+1 query problem. To solve this, we can use DataLoader.

DataLoader can merge multiple requests into a single one, thereby reducing the number of database queries, and also caching query results to avoid redundant queries.

The N+1 Query Problem

Consider a scenario where we need to query all users and their respective posts. Our data table structure is as follows:

import { drizzleSilk
 } from "@gqloom/drizzle"
import { relations
 } from "drizzle-orm"
import * as t
 from "drizzle-orm/pg-core"

export const roleEnum
 = t
.pgEnum
("role", ["user", "admin"])

export const users
 = drizzleSilk
(
  t
.pgTable
("users", {
    id
: t
.serial
().primaryKey
(),
    createdAt
: t
.timestamp
().defaultNow
(),
    email
: t
.text
().unique
().notNull
(),
    name
: t
.text
(),
    role
: roleEnum
().default
("user"),
  })
)

export const usersRelations
 = relations
(users
, ({ many
 }) => ({
  posts
: many
(posts
),
}))

export const posts
 = drizzleSilk
(
  t
.pgTable
("posts", {
    id
: t
.serial
().primaryKey
(),
    createdAt
: t
.timestamp
().defaultNow
(),
    updatedAt
: t

      .timestamp
()
      .defaultNow
()
      .$onUpdateFn
(() => new Date
()),
    published
: t
.boolean
().default
(false),
    title
: t
.varchar
({ length
: 255 }).notNull
(),
    authorId
: t
.integer
().notNull
(),
  })
)

export const postsRelations
 = relations
(posts
, ({ one
 }) => ({
  author
: one
(users
, { fields
: [posts
.authorId
], references
: [users
.id
] }),
}))

A straightforward resolver implementation might look like this:

import { field
, query
, resolver
 } from "@gqloom/core"
import { eq
 } from "drizzle-orm"
import { db
 } from "src/db"
import { posts
, users
 } from "src/schema"

export const userResolver
 = resolver
.of
(users
, {
  users
: query
(users
.$list
()).resolve
(() => db
.select
().from
(users
)),

  posts
: field
(posts
.$list
())
    .derivedFrom
("id")
    .resolve
((user
) =>
      db
.select
().from
(posts
).where
(eq
(posts
.authorId
, user
.id
))
    ),
})

When we execute the following query:

graphql

query usersWithPosts {
  users {
    id
    name
    posts {
      id
      title
    }
  }
}

The backend execution flow will be:

Execute one query to fetch all user lists (SELECT * FROM users).
For each returned user, execute another query to fetch that user's posts (SELECT * FROM posts WHERE authorId = ?).

If the first query returns N users, then to fetch their posts, we would collectively execute 1 (fetch users) + N (fetch posts for each user) queries. This is known as the "N+1 Query Problem". When N is large, this puts immense pressure on the database, leading to performance bottlenecks.

GQLoom provides powerful tools to elegantly solve this problem.

`field().load()` Method

The simplest way is to use the field().load() method. It transforms the resolver function from handling a single parent object to handling a batch of parent objects, allowing for bulk data fetching.

The load method accepts an asynchronous function as a parameter. The first parameter of this function is an array of parent objects, parents, and subsequent parameters are the input arguments args for that field. This asynchronous function needs to return an array of the same length as the parents array, where each element corresponds to the result for a parent object.

INFO

It is crucial that the returned array strictly matches the order and length of the parents array. DataLoader relies on this order to correctly map results back to each parent object.

Let's look at an example. To solve the N+1 problem mentioned above, we can modify the resolver like this:

import { field
, resolver
 } from "@gqloom/core"
import { inArray
 } from "drizzle-orm"
import { db
 } from "src/db"
import { posts
, users
 } from "src/schema"

export const userResolver
 = resolver
.of
(users
, {
  posts
: field
(posts
.$list
())
    .derivedFrom
("id")
    .load
(async (userList
) => {
      // 1. Fetch all posts for the users at once
      const postList
 = await db

        .select
()
        .from
(posts
)
        .where
(
          inArray
(
            posts
.authorId
,
            userList
.map
((u
) => u
.id
)
          )
        )
      // 2. Group posts by authorId
      const grouped
 = Map
.groupBy
(postList
, (p
) => p
.authorId
)
      // 3. Map the posts back to each user in order
      return userList
.map
((u
) => grouped
.get
(u
.id
) ?? [])
    }),
})

In the code above, the load function receives a userList array. We extract the id of all users and use the inArray operation to fetch all related posts from the database in a single query. Then, we group the posts by authorId and finally map them back to an array whose order matches userList.

Thus, regardless of how many users we request, the query to the posts table will only be executed once.

LoomDataLoader

field().load() is a convenient API provided by GQLoom, which internally creates and manages DataLoader instances for us. However, in some scenarios, we might need finer control, or want to share the same data loader instance across different resolvers. In such cases, we can use LoomDataLoader.

GQLoom provides the LoomDataLoader abstract class and the EasyDataLoader convenience class for creating custom data loaders.

Custom Data Loaders (LoomDataLoader)

We can create a custom data loader by extending LoomDataLoader and implementing the batchLoad method.

import { field
, LoomDataLoader
, query
, resolver
 } from "@gqloom/core"
import { createMemoization
 } from "@gqloom/core/context"
import { inArray
 } from "drizzle-orm"
import { db
 } from "src/db"
import { posts
, users
 } from "src/schema"
import * as v
 from "valibot"

// 1. Create a custom DataLoader
export class UserLoader
 extends LoomDataLoader
<
  number,
  typeof users
.$inferSelect

> {
  protected async batchLoad
(
    keys
: number[]
  ): Promise
<(typeof users
.$inferSelect
 | Error)[]> {
    const userList
 = await db

      .select
()
      .from
(users
)
      .where
(inArray
(users
.id
, keys
))
    const userMap
 = new Map
(userList
.map
((u
) => [u
.id
, u
]))
    return keys
.map
(
      (key
) => userMap
.get
(key
) ?? new Error
(`User ${key
} not found`)
    )
  }
}

// 2. Use createMemoization to create a shared loader instance within the request
export const useUserLoader
 = createMemoization
(() => new UserLoader
())

// 3. Use it in the resolver
export const postResolver
 = resolver
.of
(posts
, {
  author
: field
(users
)
    .derivedFrom
("authorId")
    .resolve
((post
) => {
      const loader
 = useUserLoader
()
      return loader
.load
(post
.authorId
)
    }),
})

export const userResolver
 = resolver
.of
(users
, {
  user
: query
(users
)
    .input
({ id
: v
.number
() })
    .resolve
(({ id
 }) => {
      const loader
 = useUserLoader
()
      return loader
.load
(id
)
    }),
})

To ensure that each request has an independent data loader instance and to prevent data cache pollution between different requests, we typically combine it with the createMemoization function from Context. This will create a singleton loader within the lifecycle of each request.

import { field
, LoomDataLoader
, query
, resolver
 } from "@gqloom/core"
import { createMemoization
 } from "@gqloom/core/context"
import { inArray
 } from "drizzle-orm"
import { db
 } from "src/db"
import { posts
, users
 } from "src/schema"
import * as v
 from "valibot"

// 1. Create a custom DataLoader
export class UserLoader
 extends LoomDataLoader
<
  number,
  typeof users
.$inferSelect

> {
  protected async batchLoad
(
    keys
: number[]
  ): Promise
<(typeof users
.$inferSelect
 | Error)[]> {
    const userList
 = await db

      .select
()
      .from
(users
)
      .where
(inArray
(users
.id
, keys
))
    const userMap
 = new Map
(userList
.map
((u
) => [u
.id
, u
]))
    return keys
.map
(
      (key
) => userMap
.get
(key
) ?? new Error
(`User ${key
} not found`)
    )
  }
}

// 2. Use createMemoization to create a shared loader instance within the request
export const useUserLoader
 = createMemoization
(() => new UserLoader
())

// 3. Use it in the resolver
export const postResolver
 = resolver
.of
(posts
, {
  author
: field
(users
)
    .derivedFrom
("authorId")
    .resolve
((post
) => {
      const loader
 = useUserLoader
()
      return loader
.load
(post
.authorId
)
    }),
})

export const userResolver
 = resolver
.of
(users
, {
  user
: query
(users
)
    .input
({ id
: v
.number
() })
    .resolve
(({ id
 }) => {
      const loader
 = useUserLoader
()
      return loader
.load
(id
)
    }),
})

In this example, when useUserLoader() is called multiple times within the same GraphQL request, it will return the same UserLoader instance. Therefore, multiple calls to loader.load(id) will be automatically batched, and the batchLoad function will only be executed once.

Convenient Data Loaders (EasyDataLoader)

If you are not a fan of object-oriented programming, you can use EasyDataLoader. It accepts a batchLoad function as a constructor parameter.

The useUserLoader above can be simplified with EasyDataLoader:

import { EasyDataLoader
, field
, resolver
 } from "@gqloom/core"
import { createMemoization
 } from "@gqloom/core/context"
import { inArray
 } from "drizzle-orm"
import { db
 } from "src/db"
import { posts
, users
 } from "src/schema"

const useUserLoader
 = createMemoization
(() => {
  return new EasyDataLoader
<number, typeof users
.$inferSelect
>(async (keys
) => {
    const userList
 = await db

      .select
()
      .from
(users
)
      .where
(inArray
(users
.id
, keys
))
    const userMap
 = new Map
(userList
.map
((u
) => [u
.id
, u
]))
    return keys
.map
(
      (key
) => userMap
.get
(key
) ?? new Error
(`User ${key
} not found`)
    )
  })
})

// The usage in the resolver remains the same
export const postResolver
 = resolver
.of
(posts
, {
  author
: field
(users
)
    .derivedFrom
("authorId")
    .resolve
((post
) => {
      const loader
 = useUserLoader
()
      return loader
.load
(post
.authorId
)
    }),
})

Data Loaders ​

The N+1 Query Problem ​

field().load() Method ​

LoomDataLoader ​