Neural Graph Databases. A new milestone in graph data… | by Michael Galkin | Mar, 2023

By Jessie Hobb On Mar 28, 2023

What’s New in Graph ML?

A new milestone in graph data management

We introduce the concept of Neural Graph Databases as the next step in the evolution of graph databases. Tailored for large incomplete graphs and on-the-fly inference of missing edges using graph representation learning, neural reasoning maintains high expressiveness and supports complex logical queries similar to standard graph query languages.

Image by Authors, assisted by Stable Diffusion.

Neural Graph Databases: What and Why?
The blueprint of NGDBs
Neural Graph Storage
Neural Query Engine
Neural Graph Reasoning for Query Engines
Open Challenges for NGDBs
Learn More

🍨Vanilla graph databases are pretty much everywhere thanks to the ever-growing graphs in production, flexible graph data models, and expressive query languages. Classical, symbolic graph DBs are fast and cool under one important assumption:

Completeness. Query engines assume that graphs in classical graph DBs are complete.

Under the completeness assumption, we can build indexes, store the graphs in a variety of read/write-optimized formats and expect the DB would return what is there.

But this assumption does not often hold in practice (we’d say, doesn’t hold way too often). If we look at some prominent knowledge graphs (KGs): in Freebase, 93.8% of people have no place of birth and 78.5% have no nationality, about 68% of people do not have any profession, while in Wikidata, about 50% of artists have no date of birth, and only 0.4% of known buildings have information about height. And that’s for the largest KG openly curated by hundreds of enthusiasts. Surely, 100M nodes and 1B statements are not the largest ever graph in the industry, so you can imagine the degree of incompleteness there.

Clearly, to account for incompleteness, in addition to “what is there?” we have to also ask “what is missing?” (or “what can be there?”). Let’s look at the example:

(a) – input query; (b) — incomplete graph with predicted edges (dashed lines); (c) — a SPARQL query returning one answer (UofT) via graph traversal; (d) — neural execution that recovers missing edges and returns two new answers (UdeM, NYU). Image by Authors.

Here, given an incomplete graph (edges (Turing Award, win, Bengio) and (Deep Learning, field, LeCun) are missing) and a query “At what universities do the Turing Award winners in the field of Deep Learning work?” (expressed in a logical form or in some language like SPARQL), a symbolic graph DB would return only one answer UofT reachable by graph traversal. We refer to such answers as easy answers, or existing answers. Accounting for missing edges, we would recover two more answers UdeM and NYU (hard answers, or inferred answers).

How to infer missing edges?

In classical DBs, we don’t have much choice. RDF-based databases have some formal semantics and can be backed by hefty OWL ontologies but, depending on graph size and complexity of inference, it might take an infinite amount of time to complete the inference in SPARQL entailment regimes. Labeled Property Graph (LPG) graph databases do not have built-in means for inferring missing edges at all.
Thanks to the advances in Graph Machine Learning, we can often perform link prediction in a latent (embedding) space in linear time! We can then extend this mechanism to executing complex, database-like queries right in the embedding space.

Neural Graph Databases combine the advantages of traditional graph DBs with modern graph machine learning.

That is, DB principles like (1) graphs as a first-class citizen, (2) efficient storage, and (3) uniform querying interface are now backed by Graph ML techniques such as (1) geometric representations, (2) robustness to noisy inputs, (3) large-scale pretraining and fine-tuning in order to bridge the incompleteness gap and enable neural graph reasoning and inference.

In general, the design principles for NGDBs are:

The data incompleteness assumption — the underlying data might have missing information on node-, link-, and graph-levels which we would like to infer and leverage in query answering;
Inductiveness and updatability — similar to traditional databases that allow updates and instant querying, representation learning algorithms for building graph latents have to be inductive and generalize to unseen data (new entities and relation at inference time) in the zero-shot (or few-shot) manner to prevent costly re-training (for instance, of shallow node embeddings);
Expressiveness — the ability of latent representations to encode logical and semantic relations in the data akin to FOL (or its fragments) and leverage them in query answering. Practically, the set of supported logical operators for neural reasoning should be close to or equivalent to standard graph database languages like SPARQL or Cypher;
Multimodality beyond knowledge graphs — any graph-structured data that can be stored as a node or record in classical databases (consisting, for example, of images, texts, molecular graphs, or timestamped sequences) and can be imbued with a vector representation is a valid source for the Neural Graph Storage and Neural Query Engine.

The key methods to address the NGDB principles are:

Vector representation as the atomic element — while traditional graph DBs hash the adjacency matrix (or edge list) in many indexes, the incompleteness assumption implies that both given edges and graph latents (vector representations) become the sources of truth in the Neural Graph Storage;
Neural query execution in the latent space –, basic operations such as edge traversal cannot be performed solely symbolically due to the incompleteness assumption. Instead, the Neural Query Engine operates on both adjacency and graph latents to incorporate possibly missing data into query answering;

In fact, by answering queries in the latent space (and not sacrificing traversal performance) we can ditch symbolic database indexes altogether.

The main difference between symbolic graph DBs and neural graph DBs: traditional DBs answer the question “What is there?” by edge traversal while neural graph DBs also answer “What is missing?”. Image by Authors.

Before diving into NGDBs, let’s take a look at neural databases in general — turns out they have been around for a while and you might have noticed that. Many machine learning systems already operate in this paradigm when data is encoded into model parameters and querying is equivalent to a forward pass that can output a new representation or prediction for a downstream task.

Neural Databases: Overview

What is the current state of neural databases? What are the differences between its kinds and what’s special about NGDBs?

Differences between Vector DBs, natural language DBs, and neural graph DBs. Image by Authors

Vector databases belong to the family of storage-oriented systems commonly built around approximate nearest neighbor libraries (ANN) like Faiss or ScaNN (or custom solutions) to answer distance-based queries using Maximum Inner-Product Search (MIPS), L1, L2, or other distances. Being encoder-independent (that is, any encoder yielding vector representations can be a source like a ResNet or BERT), vector databases are fast but lack complex query answering capabilities.
With the recent rise of large-scale pretrained models — or, foundation models — we have witnessed their huge success in natural language processing and computer vision tasks. We argue that such foundation models are also a prominent example of neural databases. There, the storage module might be presented directly with model parameters or outsourced to an external index often used in retrieval-augmented models since encoding all world knowledge even into billions of model parameters is hard. The query module performs in-context learning either via filling in the blanks in encoder models (BERT or T5 style) or via prompts in decoder-only models (GPT-style) that can span multiple modalities, eg, learnable tokens for vision applications or even calling external tools.
Natural Language Databases (NLDB) introduced by Thorne et al model atomic elements as textual facts encoded to a vector via a pre-trained language model (LM). Queries to NLDB are sent as natural language utterances that get encoded to vectors and query processing employs the retriever-reader approach.

Neural Graph Databases is not a novel term — many graph ML approaches tried to combine graph embeddings with database indexes, perhaps RDF2Vec and LPG2Vec are some of the most prominent examples how embeddings can be plugged into existing graph DBs and run on top of symbolic indexes.

In contrast, we posit that NGDBs can work without symbolic indexes right in the latent space. As we show below, there exist ML algorithms that can simulate exact edge traversal-like behavior in embedding space to retrieve “what is there” as well as perform neural reasoning to answer “what is missing”.

Neural Graph Databases: Architecture

A conceptual scheme of Neural Graph Databases. An input query is processed by the Neural Query Engine where the Planner derives a computation graph of the query and the Executor executes the query in the latent space. The Neural Graph Storage employs Graph Store and Feature Store to obtain latent representations in the Embedding Store. The Executor communicates with the embedding store to retrieve and return results. Image by Authors

On a higher level, NGDB contains two main components, Neural Graph Storage and Neural Query Engine. The query answering pipeline starts with the query sent by some application or downstream task already in a structured format (obtained, for example, via semantic parsing if an initial query is in natural language to transform it into a structured format).

The query first arrives to the Neural Query Engine, and, in particular, to the Query Planner module. The task of the Query Planner is to derive an efficient computation graph of atomic operations (projections and logical operations) with respect to the query complexity, prediction tasks, and underlying data storage such as possible graph partitioning.

The derived plan is then sent to the Query Executor that encodes the query in a latent space, executes the atomic operations over the underlying graph and its latent representations, and aggregates the results of atomic operations into a final answer set. The execution is done via the Retrieval module that communicates with the Neural Graph Storage.

The storage layer consists of

1️⃣ Graph Store for keeping the multi-relational adjacency matrix in space- and time-efficient manner (eg, in various sparse formats like COO and CSR);

2️⃣ Feature Store for keeping node- and edge-level multimodal features associated with the underlying graph.

3️⃣ Embedding Store that leverages an Encoder module to produce graph representations in a latent space based on the underlying adjacency and associated features.

The Retrieval module queries the encoded graph representations to build a distribution of potential answers to atomic operations.

In traditional graph DBs (right), queries are optimized into a plan (often, a tree of join operators) and executed against the storage of DB indexes. In Neural Graph DBs (left), we encode the query (or its steps) in a latent space and execute against the latent space of the underlying graph. Image by Authors.

In traditional graph DBs, storage design often depends on the graph modeling paradigm.

The two most popular paradigms are Resource Description Framework (RDF) graphs and Labeled Property Graphs (LPG). We posit, however, that the new RDF-star (and accompanying SPARQL-star) is going to unify those two merging logical expressiveness of RDF graphs with attributed nature of LPG. Many existing KGs already follow the RDF-star (-like) paradigm like hyper-relational KGs and Wikidata Statement Model.

If we are to envision the backbone graph modeling paradigm in the next years, we’d go for RDF-star.

In the Neural Graph Storage, both the input graph and its vector representations are sources of truth. For answering queries in the latent space, we need:

Query Encoder
Graph Encoder
Retrieval mechanism to match query representation against the graph representation

The graph encoding (embedding) process can be viewed as a compression step but the semantic and structure similarity of entities/relations is kept. The distance between entities/relations in the embedding space should be positively correlated with the semantic/structure similarity. There are many options for the architecture of the encoder — and we recommend sticking to inductive ones to adhere to the NGDB design principles. In our recent NeurIPS 2022 work, we presented two such inductive models.

Query encoding is usually matched with the nature graph encoding such that both of them will be in the same space. Once we have latent representations, the Retrieval module kicks in to extract relevant answers.

The retrieval process can be seen as a nearest neighbor search of the input vector in the embedding space and has 3 direct benefits:

Confidence scores for each retrieved item — thanks to a predefined distance function in the embedding space
Different definitions of the latent space and the distance function — catering for different graphs, eg, tree-like graphs are easier to work in hyperbolic spaces
Efficiency and scalability — retrieval scales to extremely large graphs with billions of nodes and edges

Query planning in NGDBs (left) and traditional graph DBs (right). The NGDB planning (assuming incomplete graphs) can be performed autoregressively step-by-step (1) or generated entirely in one step (2). The traditional DB planning is cost-based and resorts to metadata (assuming complete graphs and extracted from them) such as the number of intermediate answers to build a tree of join operators. Image by Authors

In traditional DBs, a typical query engine performs three major operations. (1) Query parsing to verify syntax correctness (often enriched with a deeper semantic analysis of query terms); (2) Query planning and optimization to derive an efficient query plan (usually, a tree of relational operators) that minimizes computational costs; (3) Query execution that scans the storage and processes intermediate results according to the query plan.

It is rather straightforward to extend those operations to NGDBs.

1️⃣ Query Parsing can be achieved via semantic parsing to a structured query format. We intentionally leave the discussion on a query language for NGDBs for future works and heated public discussions 😉

2️⃣ Query Planner derives an efficient query plan of atomic operations (projections and logical operators) maximizing completeness (all answers over existing edges must be returned) and inference (of missing edges predicted on the fly) taking into account query complexity and underlying graph.

3️⃣ Once the query plan is finalized, the Query Executor encodes the query (or its parts) into a latent space, communicates with the Graph Storage and its Retrieval module, and aggregates intermediate results into the final answer set. There exist two common mechanisms for query execution:

Atomic, resembling traditional DBs, when a query plan is executed sequentially by encoding atomic patterns, retrieving their answers, and executing logical operators as intermediate steps;
Global, when the entire query graph is encoded and executed in a latent space in one step.

The main challenge for neural query execution is matching query expressiveness to that of symbolic languages like SPARQL or Cypher — so far, neural methods can execute queries close to First-Order Logic expressiveness, but we are somewhat halfway there to symbolic languages.

The literature on neural methods for complex logical query answering (aka, query embedding) has been growing since 2018 and the seminal NeurIPS work of Hamilton et al on Graph Query Embedding (GQE). GQE was able to answer conjunctive queries with intersections and predict missing links on the fly.

GQE can be considered as the first take on Neural Query Engines for NGDBs.

GQE started the whole subfield of Graph Machine Learning followed by some prominent examples like Query2Box (ICLR 2020) and Continuous Query Decomposition (ICLR 2021). We undertook a major effort categorizing all those (about 50) works along 3 main directions:

⚛️ Graphs — what is the underlying structure against which we answer queries;
🛠️ Modeling — how we answer queries and which inductive biases are employed;
🗣️ Queries — what we answer, what are the query structures and what are the expected answers.

The taxonomy of neural approaches for complex logical query answering. See the <paper> for more details. Image by Authors

⚛️ Talking about Graphs, we further break them down into Modality (classic triple-only graphs, hyper-relational graphs, hypergraphs, and more), Reasoning Domain (discrete entities or including continuous outputs), and Semantics (how neural encoders capture higher-order relationships like OWL ontologies).

🛠️ In Modeling, we follow the Encoder-Processor-Decoder paradigm classifying inductive biases of existing models, eg, transductive or inductive encoders with neural or neuro-symbolic processors.

🗣️ In Queries, we aim at mapping the set of queries answerable by neural methods with that of symbolic graph query languages. We talk about Query Operators (going beyond standard And/Or/Not), Query Patterns (from chain-like queries to DAGs and cyclic patterns), and Projected Variables (your favorite relational algebra ).

Analyzing the taxonomy we find that there is no silver bullet at the moment, eg, most processors can only work in discrete mode with tree-based queries. But it also means there is a lot of room for future work — possibly your contribution!

To be more precise, here are the main NGDB challenges for the following years.

Along the Graph branch:

Modality: Supporting more graph modalities: from classic triple-only graphs to hyper-relational graphs, hypergraphs, and multimodal sources combining graphs, texts, images, and more.
Reasoning Domain: Supporting logical reasoning and neural query answering over temporal and continuous (textual and numerical) data — literals constitute a major portion of graphs as well as relevant queries over literals.
Background Semantics: Supporting complex axioms and formal semantics that encode higher-order relationships between (latent) classes of entities and their hierarchies, \eg, enabling neural reasoning over description logics and OWL fragments.

In the Modeling branch:

Encoder: Inductive encoders supporting unseen relation at inference time — this a key for (1) updatability of neural databases without the need of retraining; (2) enabling the pretrain-finetune strategy generalizing query answering to custom graphs with custom relational schema.
Processor: Expressive processor networks able to effectively and efficiently execute complex query operators akin to SPARQL and Cypher operators. Improving sample efficiency of neural processors is crucial for the training time vs quality tradeoff — reducing training time while maintaining high predictive qualities.
Decoder: So far, all neural query answering decoders operate exclusively on discrete nodes. Extending the range of answers to continuous outputs is crucial for answering real-world queries.
Complexity: As the main computational bottleneck of processor networks is the dimensionality of embedding space (for purely neural models) and/or the number of nodes (for neuro-symbolic), new efficient algorithms for neural logical operators and retrieval methods are the key to scaling NGDBs to billions of nodes and trillions of edges.

In Queries:

Operators: Neuralizing more complex query operators matching the expressiveness of declarative graph query languages, e.g., supporting Kleene plus and star, property paths, filters.
Patterns: Answering more complex patterns beyond tree-like queries. This includes DAGs and cyclic graphs.
Projected Variables: Allowing projecting more than a final leaf node entity, that is, allowing returning intermediate variables, relations, and multiple variables organized in tuples (bindings).
Expressiveness: Answering queries outside simple EPFO and EFO fragments and aiming for the expressiveness of database languages.

Finally, in Datasets and Evaluation:

The need for larger and diverse benchmarks covering more graph modalities, more expressive query semantics, more query operators, and query patterns.
As the existing evaluation protocol appears to be limited (focusing only on inferring hard answers) there is a need for a more principled evaluation framework and metrics covering various aspects of the query answering workflow.

Pertaining to the Neural Graph Storage and NGDB in general, we identify the following challenges:

The need for a scalable retrieval mechanism to scale neural reasoning to graphs of billions of nodes. Retrieval is tightly connected to the Query Processor and its modeling priors. Existing scalable ANN libraries can only work with basic L1, L2, and cosine distances that limit the space of possible processors in the neural query engine.
Currently, all complex query datasets provide a hardcoded query execution plan that might not be optimal. There is a need for a neural query planner that would transform an input query into an optimal execution sequence taking into account prediction tasks, query complexity, type of the neural processor, and configuration of the Storage layer.

Due to encoder inductiveness and updatability without retraining, there is a need to alleviate the issues of continual learning, catastrophic forgetting, and size generalization when running inference on much larger graphs than training ones.

NGDB is still an emerging concept with many open challenges for future research. If you want to learn more about NGDB, feel free to check out

📜 our paper (arxiv),
🌐 our website,
🔧 GitHub repo with the most up-to-date list of relevant papers, datasets, and categorization, feel free to open issues and PRs.

We will also be organizing workshops, stay tuned for the updates!