Migrate from Relational Database to Graph Database
Discover the process of migrating from a relational database to a graph database. This guide covers schema analysis, data transformation, and optimization techniques for AI/ML workflows.
Discover the process of migrating from a relational database to a graph database. This guide covers schema analysis, data transformation, and optimization techniques for AI/ML workflows.
Discover how to leverage LlamaIndex RAG with FalkorDB to create efficient GraphRAG systems. Enhance LLM performance with knowledge graph-powered retrieval and generation.
Explore how AI agents leverage memory systems and graph databases to maintain context, process relationships, and make autonomous decisions. Analysis of architecture, implementation, and performance impacts.
Knowledge graphs have become a game-changer in building Retrieval-Augmented Generation (RAG) applications, often referred to as GraphRAG. These applications enhance the reasoning capabilities of large language models (LLMs) by providing structured context from a knowledge base. By organizing information into a graph format, knowledge graphs allow for more interconnected and structured data, enabling LLMs to retrieve relevant context with greater accuracy. Recent research shows that this approach leads to more informed and contextually appropriate responses from LLMs, especially when handling complex queries requiring deep understanding and reasoning across various domains. To build a knowledge graph, information is structured into nodes and edges. Nodes represent entities or concepts, while edges represent the relationships between them. However, building a knowledge graph from unstructured data or raw text can be challenging. This is where knowledge graph tools become essential, playing a crucial role in extracting, organizing, and managing knowledge from unstructured sources. In this article, I will provide a comprehensive overview of knowledge graph tools and explain how they facilitate the creation and management of knowledge graphs for your AI applications. Knowledge Graph vs Graph Database Before we dive in, let’s clarify a concept that is often confused: the difference between a knowledge graph and a graph database. A knowledge graph is a graph that captures facts, usually in the form of a triplet (subject-object-predicate). In contrast, a graph database is primarily designed for efficiently storing and querying graphs. Knowledge Graph: Focuses on the semantic representation of knowledge. Encompasses entities, relationships, and attributes, enabling a more contextual understanding of data. Often used for applications like search engines and recommendation systems. Graph Database: Primarily designed for storing and querying data using graph structures. Focuses on efficiently managing connections between data points. Utilized to store knowledge graphs. What is a Knowledge Graph Tool? A knowledge graph tool is software or a platform that allows you to create, visualize, and utilize knowledge graphs. These tools enable you to model data, define relationships, and extract valuable insights, making them essential for building knowledge graph-powered applications. Functions of Knowledge Graph Tools Data Modeling: Allows users to design the structure of their knowledge graph, defining entities, relationships, and attributes. Data Integration: Supports the integration of data from diverse sources, including relational databases, and APIs. Querying: Provides robust querying capabilities, often utilizing specialized query languages like Cypher to extract the information. Visualization: Enables users to visualize the graph, making it easier to understand relationships and patterns within the data. Analytics: Incorporates machine learning and analytics features to derive insights and identify trends from the graph. Simply put, knowledge graph tools form the ecosystem of technologies needed to simplify working with knowledge graphs. “Frameworks like GraphRAG-SDK combine graph-based data management with LLM-powered AI capabilities, which makes the suitable for complex AI that require enhanced output relevance and accuracy” Guy Korland, FalkorDB CEO X Repost Types of Knowledge Graph Tools Knowledge graph tools can vary, serving different purposes depending on the complexity of the data and the application’s requirements. These tools range from basic graph database systems to comprehensive platforms integrated with machine learning, AI, and visualization capabilities. Graph Database Systems: These foundational tools store and manage data in graph formats. An example is FalkorDB, optimized for querying relationships between entities in a graph structure. These systems are ideal for businesses that need to analyze interconnected data and perform fast queries based on relationships. AI-Integrated Frameworks: Frameworks like GraphRAG-SDK combine graph-based data management with LLM-powered AI capabilities. These tools go beyond simple graph storage by integrating LLMs for reasoning and contextualization. This makes them suitable for complex AI applications that leverage Retrieval-Augmented Generation (RAG), where knowledge graphs enhance the relevance and accuracy of LLM outputs. Domain-Specific Solutions: These are specialized tools that are designed for specific domains. These platforms often include ontologies and semantic reasoning capabilities to unify and manage data across diverse sources. They are particularly useful for organizations seeking to use dynamic knowledge graph construction for AI-driven insights. For instance, tools like Code Graph can help you use knowledge graphs to visualize and explore code. Dynamic Knowledge Graph Construction Tools: These solutions use natural language processing (NLP) and LLMs to extract entities and relationships from raw data, turning them into structured graph representations that can be used for search, reasoning, and decision-making. They help with the creation of knowledge graphs. Visualization Tools: A critical aspect of knowledge graphs is their ability to visualize complex relationships. These could be tools like Cytoscape.js, which allows you to build graph visualization systems, or frameworks like FalkorDB Browser, a NoCode system for interactive graph visualization. These tools help transform intricate data relationships into user-friendly graphical representations, making it easier to spot patterns and insights. Each category of tool offers distinct capabilities, from basic graph storage to advanced AI-powered data processing and visualization, catering to different use cases depending on the scale and complexity of your knowledge graph project. How Do Knowledge Graph Tools Work? Knowledge graph tools operate through a series of processes that prepare, manage, and enhance data, enabling effective querying and reasoning over complex, interconnected graphs. Here’s how these tools typically work: Data Modeling and Preparation The first step in building a knowledge graph is defining the structure of the data using schemas or ontologies. This involves identifying the entities (nodes) and relationships (edges) that represent your domain of interest. Ontologies provide the semantic model that defines the types of entities, their attributes, and the relationships between them. This structure ensures that your data is organized in a way that facilitates efficient querying and reasoning across diverse datasets. Data Storage and ETL (Extract, Transform, Load) Knowledge graphs often need to integrate data from multiple sources, which may be structured (e.g., relational databases) or unstructured (e.g., text). The ETL process extracts data from these sources, transforms it into a format suitable for the graph, and loads it into a graph database. ETL tools automate the processes of cleaning, merging, and transforming data, ensuring consistency and scalability as data sources grow.
Edges in FalkorDB enable efficient graph representation and traversal using GraphBLAS tensors. Learn how FalkorDB uses GraphBLAS to support advanced graph operations and scalable graph processing, making Edges in FalkorDB a useful tool for graph data management.
Modern software architectures are complex systems of interconnected components. As projects grow, keeping track of all their moving parts becomes increasingly challenging. Complex control flows, deeply nested structures, and inconsistent naming conventions can overwhelm developers, making it difficult to understand the overall architecture. To simplify these complexities, code visualization has been a cornerstone of software engineering since its early days. By revealing architectural patterns and relationships within the code, it streamlines development, improves collaboration, aids in refactoring, reduces bugs, and accelerates onboarding. In this article, we take a comprehensive deep-dive into code visualization and its approaches, the various tools that exist, and how the combination of AI and knowledge graphs is helping create far more sophisticated code visualization systems. What is Code Visualization? Code visualization is the process of transforming code into a visual map of your software system. This map illustrates the complex relationships between different code components, making it easier to understand, analyze, and modify the code. By understanding the code’s structure and dependencies visually, you can make more informed decisions about code changes, identify potential issues, and collaborate more effectively with your team. Code visualization goes beyond code documentation and helps you visualize the entirety of your code architecture. For example, consider a simple e-commerce application. Its codebase might include modules for product catalogs, shopping carts, payment processing, and order management. A dependency graph can visually represent how these modules interact, showing which modules rely on others. This visual representation quickly reveals the interconnectedness of components in the source code, potential bottlenecks, and areas where changes might have unintended consequences. Code Visualization Diagrams Code visualization tools typically employ a variety of diagrams to represent different aspects of a software system. Let’s explore some common types: Architecture Diagrams Architecture diagrams provide a high-level view of a software system’s structure. They illustrate the components of the system, how they interact, and the technologies used. These diagrams can include various elements, such as servers, databases, and external services, and they help you understand the overall design and flow of the application. Creating an architecture diagram is often the first step in any project, irrespective of the project’s size and complexity. It helps all stakeholders get a basic idea of how the project will be built. If your project has multiple modules and different functionalities, the architecture diagram gives you an overview of what modules are present and how the different functionalities connect. More importantly, looking at an architecture diagram gives you an idea of the flow of data and control in the project. Dependency Graphs A dependency is essentially the reliance of a piece of code on another module or library. A dependency graph shows you how the various classes, modules, and libraries used in a project are related to each other. In technical terms, a dependency graph is a directed graph that illustrates the dependencies among various entities within a system. In this graph, each node represents an entity, such as a module or function, while directed edges indicate that one entity depends on another. For instance, if node A depends on node B, there is a directed edge from B to A, signifying that A cannot function correctly without B being available or completed first. UML Diagrams UML (Unified Modeling Language) diagrams are standardized visual representations used to model the architecture, design, and implementation of complex software systems. They serve as a bridge between technical and non-technical stakeholders, simplifying communication about system structure and behavior. UML diagrams include a variety of diagram types, such as class diagrams, sequence diagrams, and activity diagrams, each serving different purposes. For example, class diagrams show the relationships between classes and interfaces, while sequence diagrams visualize the flow of operations within the system. UML diagrams are widely used in both the design and documentation phases of software development, providing a common language for discussing system architecture. C4 Diagrams C4 diagrams offer a more structured approach to modeling software architecture. The C4 model stands for Context, Containers, Components, and Code, starting from the high-level context of the system and drilling down to the code level. Context Diagram: This top-level diagram provides a high-level overview of the system and its interactions with external entities, such as users and other systems. It helps stakeholders understand the system’s scope and its role within a larger ecosystem. Container Diagram: This diagram breaks down the system into its major containers, which can be applications, services, or databases. It illustrates how these containers interact with each other and the technologies used, providing a more detailed view suitable for technical audiences. Component Diagram: At this level, the focus shifts to the internal structure of each container, detailing the components that make up the container and their interactions. This diagram is crucial for understanding how individual parts contribute to the overall functionality of the system. Code Diagram: At the most granular level, this diagram dives into the specifics of the code structure, showcasing classes, interfaces, and their relationships. It serves as a blueprint for developers, linking architectural decisions to actual code implementation. Benefits of Code Visualization As we saw above, code visualization helps explain the architecture and dependencies inherent in software systems. However, its benefits go beyond just simplifying code comprehension. Identify and Fix Bugs More Easily By visualizing code, you can more easily spot anomalies or unexpected behavior within the codebase. For instance, dependency graphs can help you identify dependencies between components in your codebase and reveal how changes in one component might impact others. Architecture diagrams, on the other hand, can reveal potential areas where data bottlenecks or deadlocks might occur. Conceptualize Large-Scale Projects Large-scale software projects often involve many moving parts, making it difficult to see the forest for the trees. Visualization tools allow you to step back and view the project as a whole, helping you to better understand how different components interact and how changes in one part of the system might affect others. Visualize Dependencies Understanding the dependencies between different modules or components is pivotal
Code is the foundation of modern software, but as codebases grow in complexity, understanding and navigating them becomes increasingly challenging. Code Graph is a visual representation of a codebase, leveraging Knowledge Graphs and Large Language Models (LLMs) to map the relationships between code entities, such as functions, variables, and classes.In this article, we explore the core concepts of Code Graph, deep-dive into how they are created, and explain how it enhances code analysis. We will also showcase FalkorDB’s Code Graph tool that enables you to create a deployable Code Graph explorer and query interface from any GitHub repository. Let’s dive in! What is Code Graph and How it Enhance Code Analysis A Code Graph is a visual representation of a codebase as a Knowledge Graph that helps one explore entities in code (functions, variables, classes) and their relationships. By mapping out these connections, it becomes easier to understand the structure and flow of the code, identify potential issues, and improve overall code quality. The concept of representing code as a graph data structure has its roots in the early days of software engineering when researchers explored ways to model and analyze program structure and behavior using graph-based techniques. Modern Code Graphs, incorporating Knowledge Graphs and Large Language Models (LLMs), are a recent development. The emergence of modern Knowledge Graph databases, such as FalkorDB, has made it possible to efficiently store, query, and visualize large-scale code graphs. This enhances the understanding of code; its capability in helping with code navigation can empower developers with numerous benefits agnostic of the programming language they are using: Improved Understanding: Helps trace the flow of data through functions and identify interconnected components. Impact Analysis: Assesses the ripple effects of code changes, predicting potential issues before they arise. Autocompletion: Suggests relevant functions, variables, and types based on the current context. Code Search: Searches for functionalities not just by keywords, but by understanding the relationships between code elements. The ability of Code Graphs to provide a clear, graphical view of complex code structures makes it simpler to trace code execution paths and call graphs, pinpoint areas of high complexity, and facilitate better debugging and refactoring. We have created a few example Code Graphs and their corresponding visualization at FalkorDB. RAG (Retrieval Augmented Generation) for Code Graph Creation The advent of Knowledge Graphs has made it possible to not only visualize but also traverse and reason over the relationships within a Code Graph. For instance, using appropriate Cypher graph queries, you can: Discover recursive functions within a codebase; Explore methods that are not used at all; Find the methods that are most used; Discover how one function impacts another. However, what if you wanted to use natural language queries to do the same? This is where modern LLMs and Retrieval-Augmented Generation (RAG) architecture come into play. By leveraging LLMs, developers can pose natural language questions about their codebase, and the RAG pipeline can potentially help transform these queries into graph queries while also explaining the results retrieved. Developers can ask questions like: “Which functions are most frequently called in this module?” or “Are there any unused methods in the project?” They can receive insights without needing to master complex query languages, enabling a more accessible way to interact with the codebase. This can unlock several capabilities: Improved code navigation Understanding of the different modules, functions, classes, methods Creating documentation from code Discovering dependencies within the code RAG architecture works by integrating a retrieval model with a generative model (powered by LLMs), where the retrieval model first fetches the relevant documents or data from a data store based on the input query. The retrieved information is then used as context by the generative model to produce more accurate and contextually relevant outputs, effectively combining search and generation capabilities. With RAG architectures, developers commonly use Vector Databases to retrieve documents by using similarity search. This approach, however, breaks down when it comes to Code Graph, as we will see below. Advantages of Using Knowledge Graphs Over Vector Databases for RAG-Powered Code Graph To build a RAG for a Code Graph, developers can provide context to LLMs using either a Vector Database or a Knowledge Graph. Vector Databases store data as high-dimensional vector embeddings, which are numerical representations of unstructured data that capture its semantic meaning. In the case of code exploration, converting a codebase and its elements into vector embeddings enables searching through the vector space to find similar or dissimilar functions based on a query. For instance, the vector embedding for function A might look like [0.12, 0.31, 0.56, 0.88, …, 0.92], while the vector embedding for function B might look like [0.83, 0.66, 0.91, 0.89, …, 0.91]. Using Vector Databases, each data point is converted into a numerical representation, allowing for similarity searches to determine which elements (functions, arguments) are closer or farther apart. This also allows for the use of natural language queries to pinpoint the right section of a codebase. However, this method breaks down when you try to reason over the codebase or explore relationships between functions, modules, classes, and so on. For this, you need a way to capture relationships between elements of code in a structured way. This is where Knowledge Graphs offer significant advantages: Structured Relationships: Knowledge Graphs capture the direct relationships between different code elements, such as inheritance, dependencies, and usage patterns. Graph Query: With Knowledge Graphs, you can use a graph query language like Cypher to traverse and analyze the graph. This allows you to identify recursive functions, unused methods, or highly utilized functions, and understand how different parts of the codebase interact with each other. Reasoning: Knowledge Graphs support reasoning and inference, allowing you to derive new insights from existing relationships. This is essential for tasks like impact analysis, where you need to understand the potential consequences of changes in the codebase. Integration with RAG: When integrated with RAG architecture, Knowledge Graphs can provide rich, contextual information to LLMs, enabling more accurate and contextually relevant outputs. This combination
The latest release of FalkorDB V4.0.5 includes a new ability to easily clone graphs. In this blog post we’ll be developing a state machine framework where a machine is represented by a graph. Whenever a FSM (finite state machine) is executed a copy of the initial graph is created and the execution is bound to that dedicated clone. This approach is extremely flexible, as one can easily adjust a machine and changes will be applied to all future executions seamlessly, or in case a modification needs to be A/B tested a clone of the graph is easily made and compared to a baseline. Let’s get started, we’ll create a simple machine which: 1. Download a source file 2. Count the number of lines in that file 3. Delete the file State Machine Representation Our graph representation of a state machine is a simple DAG (directed acyclic graph) Nodes represent states, in each machine there’s a START state and an END state in addition to intermediate states. Every state node contains the following attributes: Cmd – the shell command to run Description – short description of the state Output – command output (available after execution) ExitCode – command exit code (available after execution) The states are connected to one another via a NEXT directed edge. Pipeline Running the State Machine Once we’re ready to execute our FSM, a copy of the DAG is created, this is done automatically via the handy new GRAPH.COPY command, as we don’t want to taint our machine “template” with specific execution information. The execution begins at the START state, our runner executes the state’s command, once the command completes the runner updates the states’s Output and ExitCode attributes and proceed to the next state, this process repeats itself until the last state is executed. Output: Conclusions With very little effort we’ve been able to build a simple state machine system, which takes advantage of a number of FalkorDB unique features: the ability to store, maintain and switch effortlessly between thousandths of different graphs our new feature to quickly create copies of graphs The source code of this demo is available on Github. Continuing with this demo we would love to explore an integration with one of the already established FSM frameworks, it’s our belief that FalkorDB can be an interesting backend for such systems.
LLMs today The potential of using LLMs for knowledge extraction is nothing less than amazing, in this last couple of months we’ve seen a rush towards integrating large language models to perform a variety of tasks, e.g. data summarization, Q&A chat bots and entity extraction are just a few examples of what people are doing with these models. With this new technology new disciplines and challenges emerge: What’s the proper way to query my data? How can I reduce model hallucinations? Where should I store my data? Current approach As It seems Vector databases became the default options for indexing, storing and retrieving data which will later be presented as context along with a question or a task to the LLM.The flow is quite straightforward, consider a list of documents containing data we would like to query (these can be Wikipedia pages, corporate proprietary knowledge or a list of recipes) the data is usually chunked into smaller pieces, embeddings are created for each piece and finally the data along with its embeddings are stored within a Vector database. When it’s time to ask a question e.g. suggest three Italian recipes which don’t contain eggplants for a dinner party of four. The question itself gets embedded into a vector, the Vector database is asked to provide K (let’s say 20) semantically similar vectors (recipes in our case), it is these results from the DB which will form a context presented to the LLM along with the original question, in the hope that the context is rich and accurate enough for the LLM to provide suitable answers. One major flaw with this approach is it too limited, the backing DB will only provide results which are semantically “close” to the user question, as such the generated context is lacking vital information need by the LLM to provide a decent answer. Alternative As an alternative one can use knowledge graph to not only store and query the original documents but to also capture different entities & relations embedded within one’s data. To utilize a graph DB as a knowledge base for LLMs we start out by constructing a knowledge graph from our documents, this process includes identifying different entities and the relationships among them, e.g. (Napoleon Bonaparte) – [IMPRISONED] -> (island of Saint Helena) * Surprisingly LLMs can be used for this process as well. Once the graph is constructed we’ll be using it for context construction, A question presented by a user is translated into a graph query, at this point we’re not limited to a set of K semantically similar vectors but we can utilize all of the connections stored within our graph to generate a much richer context, it is this context along with the original question that is presented to the LLMs to get the final answer. Context extraction Querying graph for context Graph generation Entity and relation extraction from raw text Demo To put all of the above into practice I’ve constructed a demo using LangChain querying music related Wikipedia pages comparing a vector store setup against a knowledge graph (FalkorDB). from langchain.chains import GraphCypherQAChain from langchain.chat_models import ChatOpenAIdef from langchain.graphs import FalkorDBGraph def query_graph(graph_id, query): graph = FalkorDBGraph(graph_id, host="localhost", port=6380) graph.refresh_schema() chain = GraphCypherQAChain.from_llm(cypher_llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"), qa_llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"), graph=graph, verbose=True) return chain.run(query) query_graph("music", "Which musician had the most collaborations?") query_graph("music", "Which two musician are married ?") query_graph("music", "Which country produced the most talent?") Q&A Here are the questions and answers I’ve got from each setup: Which musician had the most collaborations? Vector – Mark Hudson did the most collaborations.Graph – Mark Hudson did the most collaborations with a total of 8. Which two musician are married? Vector – There is no information provided about any musicians being married to each other.Graph – Bob Thiele and Teresa Brewer are married musicians. Which artist won an award? Vector – Usher won multiple awards, including Grammy Awards and Soul Train Music Awards.Graph – Usher won a Grammy Award. Which country produced the most talent? Vector – The document does not provide information about which country produced the most talent in country music.Graph – The country that produced the most talent is the United States of America. Is there an indirect connection between Kylie Minogue and Simon Franglen, if so name the artists on that path? Vector – There is no indirect connection between Kylie Minogue and Simon Franglen.Graph – Yes, there is an indirect connection between Kylie Minogue and Simon Franglen. The artists on that path are Whitney Houston, Barbra Streisand, Graham Stack, and Rod Stewart. Conclusions As can be seen the Vector database setup managed to answer only 2/5 of the questions, this is quite expected as the questions asked are not semantically close to the answers, as such we can’t expect the retrieved documents from the Vector DB to contain the necessary information to answer the questions. On the other hand the Graph database setup did manage to answer all 5 questions, the success of this approach is primarily account for the auto generated graph query used to build a much more relevant and richer LLM context. Although in this examples we’ve seen Graph doing quite well, It is my belief that a more robust solution combines both worlds, this is why FalkorDB had introduced a Vector index as part of its indexing suite, now one can start building a query context using a combination of a vector search and graph traversals, consider a user question which kicks of a vector search which ends up with K nodes from which graph traversal continues, reaching important fragments of data scattered along the graph.
The ability to scale out a database is crucial, in this short post I would like to walk through how FalkorDB scales out. As a quick recap I should mention that FalkorDB is a native graph database, developed as a Redis module, Falkor can manage thousands of individual graphs on a single instance. Baseline We start out with a single FalkorDB instance, let’s call it primary, this instance handles both READ and WRITE operations. # create primary database docker run –name primary –rm -p 6379:6379 falkordb/falkordb As a next step we would like to isolate our reads queries from our writes, to do so we fire up a new FalkorDB instance, let’s name it secondary and define it as a replica of primary. Once initial replication between the two servers is done we can divert all of our read queries to secondary and only hand off write queries to primary. # create replica docker run –name secondary –rm -p 6380:6380 -e REDIS_ARGS="–port 6380 –replicaof 172.17.0.2 6379" falkordb/falkordb:edge Multiple replicas It is worth mentioning that we’re not limited to just a single READ replica, but we can create as many READ replicas as we need, e.g. a single primary and three read replicas: replica-1, replica-2 and replica-3. A load balancer can distribute the read load among these three replicas. Distribute graphs In the former example we’ve distributed the entire dataset from the primary database to multiple replicas, in cases where multiple graphs are managed on a single server e.g. primary-1 holds graphs: G-1, G-2 and G-3.We can distribute the graphs among multiple servers, for example primary-1 would manage G-1 and a new server primary-2 would host G-2 and G-3. Write operations will be routed to the appropriate server depending on the accessed graph key.Of course each one of these primary servers can have multiple read replicas. e.g. primary-1 can have two read replicas and primary-2 will replicate its dataset to just a single replica. Efficient replication FalkorDB version 4 introduce a quick and efficient way of replicating queries between primary and its replicas. Up until recently a WRITE query (which ran to completion and modified a graph) would be replicated as-is to all replicas, causing each replica to re-run the query, although such a replication schema is simple and straightforward it entails a number of issues: 1. Replicated query might fail due to insufficient resources or timeout. 2. Using time related or random functions within a query risks ending up with data discrepancy. # Usage of time and randomness MATCH (a), (b) WHERE a.create < time() -100 AND b.id = tointeger(100 * rand()) CREATE (a)-[:R]->(b) # Although some WRITE queries are short and quick to execute e.g. CREATE (:Country {name:'Montenegro'}) # Others might include a long and costly read portions e.g. MATCH (c:Country) WITH average(c.area / c.population) as avg_density MATCH (c:Country) WHERE c.area / c.population > avg_density SET c.crowded = true It would be a waste of time re-running write queries on the replicas, the primary DB had already done the hard work, it computed the “change-set” and so instead of sending the original query to its replicas the primary sends the query’s “effects”, an effect is a compact binary representation of a change e.g. connect node 5 to node 72 with a new edge, or update node 81 ‘score’ attribute to the value 4. Replicating via effects solves the two problems we’ve mentioned earlier, in addition to saving the time spent computing what needs to be changed on the replicas. Benchmarking The benchmark tests three setups: single Primary Primary & Replica seperating reads from writes Primary & 2 Replicas scaling out reads Querying a graph with ~50M nodes and ~50M edges # Creating the dataset; CREATE INDEX FOR (p:Person) ON (p.id) UNWIND range(0, 1000000) AS x CREATE (p:Person {id:x}) MATCH (p:Person) UNWIND range(0, toInteger(rand() * 100)) AS x CREATE (p)-[:CONNECTED]->(:Z) # READ query: MATCH (p:Person {id:$id})-[]->() return count(1) # WRITE query: MATCH (a:Person {id:$a_id}) CREATE (a)-[:CONNECTED]->(:Z)