Dataloader

Due to the flexibility of GraphQL, when we load the associated objects of a certain object, we usually need to execute multiple queries. This leads to the well-known N+1 query problem. To solve this problem, we can use DataLoader.

DataLoader can combine multiple requests into one request, thus reducing the number of database queries. It can also cache query results to avoid duplicate queries.

Examples

Table Definition

Consider that we have two tables, users and posts, where posts is associated with the id of users through posts.authorId:

import { drizzleSilk } from "@gqloom/drizzle"
import { relations } from "drizzle-orm"
import * as t from "drizzle-orm/pg-core"

export const roleEnum = t.pgEnum("role", ["user", "admin"])

export const users = drizzleSilk(
  t.pgTable("users", {
    id: t.serial().primaryKey(),
    createdAt: t.timestamp().defaultNow(),
    email: t.text().unique().notNull(),
    name: t.text(),
    role: roleEnum().default("user"),
  })
)

export const usersRelations = relations(users, ({ many }) => ({
  posts: many(posts),
}))

export const posts = drizzleSilk(
  t.pgTable("posts", {
    id: t.serial().primaryKey(),
    createdAt: t.timestamp().defaultNow(),
    updatedAt: t
      .timestamp()
      .defaultNow()
      .$onUpdateFn(() => new Date()),
    published: t.boolean().default(false),
    title: t.varchar({ length: 255 }).notNull(),
    authorId: t.integer().notNull(),
  })
)

export const postsRelations = relations(posts, ({ one }) => ({
  author: one(users, { fields: [posts.authorId], references: [users.id] }),
}))

Data Seeding

Let's use drizzle-seed to populate some data into the database:

import { drizzle } from "drizzle-orm/node-postgres"
import { reset, seed } from "drizzle-seed"
import { config } from "../env.config"
import * as schema from "./schema"

async function main() {
  const db = drizzle(config.databaseUrl, { logger: true })
  await reset(db, schema)
  await seed(db, schema).refine(() => ({
    users: {
      count: 20,
      with: {
        posts: [
          { weight: 0.6, count: [1, 2, 3] },
          { weight: 0.3, count: [5, 6, 7] },
          { weight: 0.1, count: [8, 9, 10] },
        ],
      },
    },
  }))
}

main()

Create a Resolver

Let's write a simple resolver for the User object:

import { field, query, resolver, weave } from "@gqloom/core"
import { eq, inArray } from "drizzle-orm"
import { drizzle } from "drizzle-orm/node-postgres"
import { config } from "./env.config"
import * as tables from "./schema"
import { posts, users } from "./schema"

const db = drizzle(config.databaseUrl, { schema: tables, logger: true })

const userResolver = resolver.of(users, {
  users: query(users.$list()).resolve(() => db.select().from(users)),

  posts: field(posts.$list())
    .derivedFrom("id")
    .resolve((user) => db.select().from(posts).where(eq(posts.authorId, user.id))),
})

export const schema = weave(userResolver)

In the above code, we defined a user resolver, which includes:

users query: used to get all users
posts field: used to get all posts of the corresponding user

Here is an example query that will return the information of all users and their corresponding posts:

GraphQL Query

query usersWithPosts {
  users {
    id
    name
    email
    posts {
      id
      title
    }
  }
}

This query will query the posts for each user separately. We populated 20 users in the database in the previous step, so this query will cause 20 queries to the posts table. This is obviously a very inefficient way. Let's see how to use DataLoader to reduce the number of queries.

Using DataLoader

Next, we will use DataLoader to optimize our query.

import { field, query, resolver, weave } from "@gqloom/core"
import { eq, inArray } from "drizzle-orm"
import { drizzle } from "drizzle-orm/node-postgres"
import { config } from "./env.config"
import * as tables from "./schema"
import { posts, users } from "./schema"

const db = drizzle(config.databaseUrl, { schema: tables, logger: true })

const userResolver = resolver.of(users, {
  users: query(users.$list()).resolve(() => db.select().from(users)),

  posts_: field(posts.$list()) 
    .derivedFrom("id") 
    .resolve((user) => db.select().from(posts).where(eq(posts.authorId, user.id))), 

  posts: field(posts.$list()) 
    .derivedFrom("id") 
    .load(async (userList) => { 
      const postList = await db 
        .select() 
        .from(posts) 
        .where( 
          inArray( 
            posts.authorId, 
            userList.map((u) => u.id) 
          ) 
        ) 
      const postMap = Map.groupBy(postList, (p) => p.authorId) 
      return userList.map((u) => postMap.get(u.id) ?? []) 
    }), 
})

export const schema = weave(userResolver)

In the above code, we use field().load() to enable batch data loading. Behind the scenes, this will use DataLoader to batch load data. Inside load(), we implement batch data loading through the following steps:

Use the in operation to get all the posts of the currently loaded users from the posts table at once;
Use Map.groupBy() to group the list of posts by the author ID;
Map the list of users to the list of posts in order. If a user has no posts, return an empty array.

In this way, we combine the original 20 queries into 1 query, thus achieving performance optimization.

It is necessary to ensure that the order of the returned array of the query function is consistent with the order of the IDs array. DataLoader relies on this order to correctly merge the results.