Data Loaders
Due to the flexibility of GraphQL, when loading related objects of a certain object, we often need to execute multiple queries. This leads to the notorious N+1 query problem. To solve this, we can use DataLoader.
DataLoader
can merge multiple requests into a single one, thereby reducing the number of database queries, and also caching query results to avoid redundant queries.
The N+1 Query Problem
Consider a scenario where we need to query all users and their respective posts. Our data table structure is as follows:
import { drizzleSilk } from "@gqloom/drizzle"
import { relations } from "drizzle-orm"
import * as t from "drizzle-orm/pg-core"
export const roleEnum = t.pgEnum("role", ["user", "admin"])
export const users = drizzleSilk(
t.pgTable("users", {
id: t.serial().primaryKey(),
createdAt: t.timestamp().defaultNow(),
email: t.text().unique().notNull(),
name: t.text(),
role: roleEnum().default("user"),
})
)
export const usersRelations = relations(users, ({ many }) => ({
posts: many(posts),
}))
export const posts = drizzleSilk(
t.pgTable("posts", {
id: t.serial().primaryKey(),
createdAt: t.timestamp().defaultNow(),
updatedAt: t
.timestamp()
.defaultNow()
.$onUpdateFn(() => new Date()),
published: t.boolean().default(false),
title: t.varchar({ length: 255 }).notNull(),
authorId: t.integer().notNull(),
})
)
export const postsRelations = relations(posts, ({ one }) => ({
author: one(users, { fields: [posts.authorId], references: [users.id] }),
}))
A straightforward resolver implementation might look like this:
import { field, query, resolver } from "@gqloom/core"
import { eq } from "drizzle-orm"
import { db } from "src/db"
import { posts, users } from "src/schema"
export const userResolver = resolver.of(users, {
users: query(users.$list()).resolve(() => db.select().from(users)),
posts: field(posts.$list())
.derivedFrom("id")
.resolve((user) =>
db.select().from(posts).where(eq(posts.authorId, user.id))
),
})
When we execute the following query:
query usersWithPosts {
users {
id
name
posts {
id
title
}
}
}
The backend execution flow will be:
- Execute one query to fetch all user lists (
SELECT * FROM users
). - For each returned user, execute another query to fetch that user's posts (
SELECT * FROM posts WHERE authorId = ?
).
If the first query returns N users, then to fetch their posts, we would collectively execute 1 (fetch users) + N (fetch posts for each user) queries. This is known as the "N+1 Query Problem". When N is large, this puts immense pressure on the database, leading to performance bottlenecks.
GQLoom provides powerful tools to elegantly solve this problem.
field().load()
Method
The simplest way is to use the field().load()
method. It transforms the resolver function from handling a single parent object to handling a batch of parent objects, allowing for bulk data fetching.
The load
method accepts an asynchronous function as a parameter. The first parameter of this function is an array of parent objects, parents
, and subsequent parameters are the input arguments args
for that field. This asynchronous function needs to return an array of the same length as the parents
array, where each element corresponds to the result for a parent object.
INFO
It is crucial that the returned array strictly matches the order and length of the parents
array. DataLoader
relies on this order to correctly map results back to each parent object.
Let's look at an example. To solve the N+1 problem mentioned above, we can modify the resolver like this:
import { field, resolver } from "@gqloom/core"
import { inArray } from "drizzle-orm"
import { db } from "src/db"
import { posts, users } from "src/schema"
export const userResolver = resolver.of(users, {
posts: field(posts.$list())
.derivedFrom("id")
.load(async (userList) => {
// 1. Fetch all posts for the users at once
const postList = await db
.select()
.from(posts)
.where(
inArray(
posts.authorId,
userList.map((u) => u.id)
)
)
// 2. Group posts by authorId
const grouped = Map.groupBy(postList, (p) => p.authorId)
// 3. Map the posts back to each user in order
return userList.map((u) => grouped.get(u.id) ?? [])
}),
})
In the code above, the load
function receives a userList
array. We extract the id
of all users and use the inArray
operation to fetch all related posts from the database in a single query. Then, we group the posts by authorId
and finally map them back to an array whose order matches userList
.
Thus, regardless of how many users we request, the query to the posts
table will only be executed once.
LoomDataLoader
field().load()
is a convenient API provided by GQLoom, which internally creates and manages DataLoader
instances for us. However, in some scenarios, we might need finer control, or want to share the same data loader instance across different resolvers. In such cases, we can use LoomDataLoader
.
GQLoom
provides the LoomDataLoader
abstract class and the EasyDataLoader
convenience class for creating custom data loaders.
Custom Data Loaders (LoomDataLoader)
We can create a custom data loader by extending LoomDataLoader
and implementing the batchLoad
method.
import { LoomDataLoader, field, query, resolver } from "@gqloom/core"
import { createMemoization } from "@gqloom/core/context"
import { inArray } from "drizzle-orm"
import { db } from "src/db"
import { posts, users } from "src/schema"
import * as v from "valibot"
// 1. Create a custom DataLoader
export class UserLoader extends LoomDataLoader<
number,
typeof users.$inferSelect
> {
protected async batchLoad(
keys: number[]
): Promise<(typeof users.$inferSelect | Error)[]> {
const userList = await db
.select()
.from(users)
.where(inArray(users.id, keys))
const userMap = new Map(userList.map((u) => [u.id, u]))
return keys.map(
(key) => userMap.get(key) ?? new Error(`User ${key} not found`)
)
}
}
// 2. Use createMemoization to create a shared loader instance within the request
export const useUserLoader = createMemoization(() => new UserLoader())
// 3. Use it in the resolver
export const postResolver = resolver.of(posts, {
author: field(users)
.derivedFrom("authorId")
.resolve((post) => {
const loader = useUserLoader()
return loader.load(post.authorId)
}),
})
export const userResolver = resolver.of(users, {
user: query(users)
.input({ id: v.number() })
.resolve(({ id }) => {
const loader = useUserLoader()
return loader.load(id)
}),
})
To ensure that each request has an independent data loader instance and to prevent data cache pollution between different requests, we typically combine it with the createMemoization
function from Context. This will create a singleton loader within the lifecycle of each request.
import { LoomDataLoader, field, query, resolver } from "@gqloom/core"
import { createMemoization } from "@gqloom/core/context"
import { inArray } from "drizzle-orm"
import { db } from "src/db"
import { posts, users } from "src/schema"
import * as v from "valibot"
// 1. Create a custom DataLoader
export class UserLoader extends LoomDataLoader<
number,
typeof users.$inferSelect
> {
protected async batchLoad(
keys: number[]
): Promise<(typeof users.$inferSelect | Error)[]> {
const userList = await db
.select()
.from(users)
.where(inArray(users.id, keys))
const userMap = new Map(userList.map((u) => [u.id, u]))
return keys.map(
(key) => userMap.get(key) ?? new Error(`User ${key} not found`)
)
}
}
// 2. Use createMemoization to create a shared loader instance within the request
export const useUserLoader = createMemoization(() => new UserLoader())
// 3. Use it in the resolver
export const postResolver = resolver.of(posts, {
author: field(users)
.derivedFrom("authorId")
.resolve((post) => {
const loader = useUserLoader()
return loader.load(post.authorId)
}),
})
export const userResolver = resolver.of(users, {
user: query(users)
.input({ id: v.number() })
.resolve(({ id }) => {
const loader = useUserLoader()
return loader.load(id)
}),
})
In this example, when useUserLoader()
is called multiple times within the same GraphQL request, it will return the same UserLoader
instance. Therefore, multiple calls to loader.load(id)
will be automatically batched, and the batchLoad
function will only be executed once.
Convenient Data Loaders (EasyDataLoader)
If you are not a fan of object-oriented programming, you can use EasyDataLoader
. It accepts a batchLoad
function as a constructor parameter.
The useUserLoader
above can be simplified with EasyDataLoader
:
import { EasyDataLoader, field, resolver } from "@gqloom/core"
import { createMemoization } from "@gqloom/core/context"
import { inArray } from "drizzle-orm"
import { db } from "src/db"
import { posts, users } from "src/schema"
const useUserLoader = createMemoization(() => {
return new EasyDataLoader<number, typeof users.$inferSelect>(async (keys) => {
const userList = await db
.select()
.from(users)
.where(inArray(users.id, keys))
const userMap = new Map(userList.map((u) => [u.id, u]))
return keys.map(
(key) => userMap.get(key) ?? new Error(`User ${key} not found`)
)
})
})
// The usage in the resolver remains the same
export const postResolver = resolver.of(posts, {
author: field(users)
.derivedFrom("authorId")
.resolve((post) => {
const loader = useUserLoader()
return loader.load(post.authorId)
}),
})