Facebook GraphQL is one of the emerging (web) technologies, giving a new light on the future of web APIs. I'm not going to introduce it here - there is a plenty of articles on that subject, starting from official site. What I'm going to write about is one specific issue of GraphQL design, we've met when working on its F# implementation. But lets start from the beginning.

Designing with capabilities

The most important difference between REST APIs and GraphQL endpoints is an approach to exposing data and available actions to the API consumers.

In RESTful APIs we describe them quite vaguely, as there is no standard approach here, and REST itself is more kind of a guidelines than natural rules. Of course some approaches to that have been already created (i.e. Swagger).

In general we expose our API as set of URLs (routes) which - when called with right set of input arguments and HTTP methods - will reply with the right data in the response.

Problem here is that eventually this always leads to epidemic growth of exposed routes, as we often need a little more (underfetching) or a little less (overfetching) fields than provided with endpoints exposed so far, but we don't want to end up making additional calls to complete missing data.

On the other hand, with GraphQL over- and under-fetching is almost never a problem. Reason for that is simple: instead of exposing endpoints aiming to reply to one particular query, we create a single endpoint an schema, which describes all data - with all possible relationships between entities - that is potentially accessible from corresponding service. Therefore you have all capabilities of your web service in one place.

So, where's the catch?

While all of that sounds simple and fun, the story behind that is a little more grim.

Lets consider a web service with some ORM and SQL-backed database - a quite common example nowadays. One of the well known problems with ORMs is their leaky abstraction. The one I'm going to write here about is known as N+1 SELECTs.

For those of you, who haven't heard of it: this is a common problem with almost all advanced ORMs, which happens when you haven't explicitly mentioned all fields, you're going to use, when sending query to a database. Once it has been executed and you're try to access a property, which has not been fetched from database, most "smart" ORMs will automagically send a subsequent query to retrieve missing data. If you're iterating over collection of those data objects, a new query will be send each time an (not fetched) field has been accessed.

How we solve this problem in RESTful API? Because we exactly know, what that is going to be returned from a route, we can simply tailor a logic able to retrieve all necessary data in a minimal number of queries.

And how about GraphQL? Well... you're fucked. There is no easy way to "tailor" a SQL query to match specific request, as an expected result set is highly dynamic, depending on the incoming GraphQL query string.

Of course, there are multiple approaches to solve this problem among many different implementations in multiple languages. Lets take a look at some of them.

Dedicated GraphQL-to-SQL library

Existing examples, I know about:

  • postgraph (node.js), which exposes PostgreSQL database schema into GraphQL, and is able to compile GraphQL queries into SQL.
  • GraphQL.NET (.NET), which translates GraphQL queries into LINQ.

Both of them have the same problems:

  • They specialize in translating one query language to another. This solves only a subset of problems. What if you need to merge response from database with a call to/from another web service?
  • They may usually introduce tight coupling between your Data Access Layer (or even database schema itself) and the model exposed on the endpoint.
  • They are separate libraries, usually hardly compatible with other, more general, GraphQL implementations.

Throttling and batching queries

Second kind - introduced in Ruby implementation - digs into backend of the ORM itself. What it did was splitting DB query execution into small timeframes. Therefore instead of immediate SQL query execution, we batch all requests within, lets say 10ms time frame, and execute them at once.

I won't cover that idea in detail, as it's more like the feature of underlying database provider than an application server implementation itself. Beside it feels a little hacky solution too ;)

Can we do more?

One of the major problems here is an interpretation of incoming GraphQL query. While most of the implementations expose info about things such as AST of the executed query and schema itself, usually that data is not correlated in any way.

In the FSharp.Data.GraphQL, we have created something called execution plan - it's an intermediate representation of GraphQL query, which includes things like inlining query fragment's fields and correlating them with schema and type system.

To show this on an example- lets say we have following type system described in our schema:

schema { query: Query }

type Query {
    users: [User!]
}

type User {
    id: ID!
    info: UserInfo!
    contacts: [UserInfo!]
}

type UserInfo {
    firstName: String
    lastName: String
}

And we want to execute a query, which looks like this:

query Example {
    users {
        id
        ...fname
        contacts {
            ...fname
        }
    }
}

fragment fname on UserInfo {
    firstName
}

How an execution plan for that query looks like?

ResolveCollection: users of [User!]
    SelectFields: users of User!
        ResolveValue: id of ID!
        SelectFields: info of UserInfo!
            ResolveValue: firstName of String
        ResolveCollection: contacts of [UserInfo!]
            SelectFields: contacts of UserInfo!
                ResolveValue: firstName of String

A little description:

  • ResolveValue is a leaf of the expression plan tree. It marks scalar or enum coercion.
  • SelectFields is used to mark retrieval of set of fields from object types.
  • ResolveCollection marks a node that is going to return a list of results.
  • ResolveAbstraction (not present on the example) is a map of object type ⇒ SelectFields set used for abstract types (unions and interfaces).

As we can see, information about query fragment has been erased (fragment's fields have been inlined), and every field contains a type information. What we have in a result is fully declarative representation of our query. What can we do from here?:

  • Standard execution path is trivial - we simply make a tree traversal and execute every field's resolve function on the way.
  • We can decide to use an interpreter to create another representation from it (i.e. LINQ/SQL query).
  • Since execution plan is build as a tree, we could "manually" add/remove nodes from it, or even split it in two, and then again interpret each part separately. Having a set operations (such as unions and intersections) over them is especially promising.
  • We could cache the execution plans, reducing number of operations necessary to perform when executing each query. This can be a big advantage in the future - especially when combined with publish/subscribe pattern.

While presence of intermediate representation doesn't solve all problems, I think it's a step in the right direction, as it allows us to compose our server logic in more declarative fashion without headaches about performance issues.