Building & Querying a Knowledge Graph from Unstructured Data

Building & Querying a Knowledge Graph from Unstructured Data

Diffbot API, FalkorDB, and LangChain are a great combination for building intelligent applications that can understand and answer questions from unstructured data.

Diffbot API has a powerful API that can extract structured data from unstructured documents, such as web pages, PDFs, or emails. With Diffbot API, you can create a Knowledge graph that represents the entities and relationships in your documents, and store it in FalkorDB. Then, you can use Langchain, to query your Knowledge graph and get answers to your questions. Langchain can handle complex and natural queries, and return relevant and accurate answers from your Knowledge graph.

1. Installing LangChain

First, you need to install LangChain and some dependencies on your machine. You can download it from the official website or use the command line:

pip install langchain langchain-experimental openai redis wikipedia

2. Starting FalkorDB server locally

Staring a local FalkorDB is as simple as running a local docker you can go read on the documentation other ways to run it

            > docker run -p 6379:6379 -it --rm falkordb/falkordb:latest

6:C 26 Aug 2023 08:36:26.297 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

6:C 26 Aug 2023 08:36:26.297 # Redis version=7.2.1, bits=64, commit=00000000, modified=0, pid=6, just started

...

...

6:M 26 Aug 2023 08:36:26.322 * <graph> Starting up FalkorDB version 99.99.99.

6:M 26 Aug 2023 08:36:26.324 * <graph> Thread pool created, using 8 threads.

6:M 26 Aug 2023 08:36:26.324 * <graph> Maximum number of OpenMP threads set to 8

6:M 26 Aug 2023 08:36:26.324 * <graph> Query backlog size: 1000

6:M 26 Aug 2023 08:36:26.324 * Module 'graph' loaded from /FalkorDB/bin/linux-x64-release/src/falkordb.so

6:M 26 Aug 2023 08:36:26.324 * Ready to accept connections
        

Running the demo

The rest of this blog will cover the simple steps you can take to get started, you can also find try the Google Colab notebook

3. Create a Knowledge Graph

Now, let’s create a demo knowledge graph of Warren Buffett using Wikipedioa

            from langchain_experimental.graph_transformers.diffbot import DiffbotGraphTransformer

from langchain.document_loaders import WikipediaLoader

diffbot_api_key = "DIFFBOT_API_KEY"

diffbot_nlp = DiffbotGraphTransformer(diffbot_api_key=diffbot_api_key)

query = "Warren Buffett"

raw_documents = WikipediaLoader(query=query).load()

graph_documents = diffbot_nlp.convert_to_graph_documents(raw_documents)
        

4Storing the Knowledge Graph in FalkorDB

Last step storing the knowledge Graph to FalkorDB

            from langchain.graphs import FalkorDBGraph

graph = FalkorDBGraph(

   "falkordb",

)

graph.add_graph_documents(graph_documents)

graph.refresh_schema()
        

5Querying the Graph

You are all set, you can start querying the Knowledge Graph… Let’s try a couple of questions.

            %env OPENAI_API_KEY=OPENAI_API_KEY

from langchain.chains import GraphCypherQAChain

from langchain.chat_models import ChatOpenAI

chain = GraphCypherQAChain.from_llm(

   cypher_llm=ChatOpenAI(temperature=0, model_name="gpt-4"),

   qa_llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"),

   graph=graph, verbose=True,

)


chain.run("Which university did Warren Buffett attend?")

> Entering new GraphCypherQAChain chain...

Generated Cypher:

MATCH (p:Person {name: "Warren Buffett"})-[:EDUCATED_AT]->(o:Organization)

RETURN o.name

Full Context:

[['Woodrow Wilson High School'], ['Alice Deal Junior High School'], ['Columbia Business School'], ['New York Institute of Finance']]

> Finished chain.

'Warren Buffett attended Columbia Business School.'

chain.run("Who is or was working at Berkshire Hathaway?")

> Entering new GraphCypherQAChain chain...

Generated Cypher:

MATCH (p:Person)-[r:EMPLOYEE_OR_MEMBER_OF]->(o:Organization) WHERE o.name = 'Berkshire Hathaway' RETURN p.name

Full Context:

[['Warren Buffett'], ['Charlie Munger'], ['Howard Buffett'], ['Susan Buffett'], ['Howard'], ['Oliver Chace']]

> Finished chain.

'Warren Buffett, Charlie Munger, Howard Buffett, Susan Buffett, Howard, and Oliver Chace are or were working at Berkshire Hathaway.'