FalkorDB Header Menu

FalkorDB on Snowflake: Native Graph Database for Cloud Data Warehouses

FalkorDB on Snowflake Native Graph Database for Cloud Data Warehouses

Highlights

Snowflake-FalkorDB-architectural-diagram

FalkorDB brings graph database capabilities to Snowflake through a Native App that turns relational tables into queryable knowledge graphs while ensuring that data remains managed under Snowflake’s security protocols, subject to the usual operational constraints. This post covers the architecture, how to get running, and the operational details you need for production workloads.

The Underlying Problem

Relational tables store data in rows. But many real queries are about *relationships between* rows: transaction chains, dependency paths, shared attributes across entities, circular references.

SQL handles this with recursive CTEs and self-joins. That works at shallow depth, but becomes unreadable and slow as traversal depth increases. A three-hop relationship query might require a recursive CTE with multiple joins. A six-hop query becomes a maintenance burden that runs for minutes.

Consider a fraud investigation: “Find all accounts within 3 hops of this suspicious transaction, including shared addresses and phone numbers.”

In SQL, this requires a recursive CTE, multiple self-joins across tables, and manual deduplication. The query grows linearly with each relationship type you add. In Cypher:

				
					MATCH (t:Transaction {id: 'TXN-001'})-[:INVOLVES*1..3]-(a:Account)
OPTIONAL MATCH (a)-[:HAS_PHONE|HAS_ADDRESS]-(shared)<-[:HAS_PHONE|HAS_ADDRESS]-(linked:Account)
RETURN DISTINCT a.name, a.phone, linked.name, linked.phone

				
			

This is not a stylistic preference. Graph query languages represent traversals directly. The query structure matches the question structure. That matters when analysts need to iterate on relationship queries daily.

Who This Is For

This fits teams that keep their data platform centered on Snowflake but run into relationship-heavy problems that SQL handles awkwardly.

Operational Querying

Data engineers

For teams maintaining Snowflake pipelines who keep getting asked for relationship queries that turn into brittle recursive CTEs, join explosions, or custom post-processing.

Multi-hop traversals Path analysis Connected components
Exploration & Modeling

Data scientists and analysts

For people exploring network structure, building knowledge graphs for GenAI and RAG, or engineering graph-driven features without spinning up another database or workflow stack.

Network exploration Knowledge graphs Graph features
Platform Evaluation

Teams evaluating graph databases

For organizations that want graph capabilities but do not want a separate cluster, another ETL pipeline moving data out of Snowflake, or a new security boundary to govern.

No separate cluster No extra ETL Same security boundary

Architecture

How It Works

FalkorDB runs inside your Snowflake account through three mechanisms:

Snowflake Container Services (SPCS)

Hosts the FalkorDB engine in an isolated compute pool. The engine processes Cypher queries and manages in-memory graph storage. It does not share compute resources with your warehouse workloads.

Reference Binding

Provides secure access to your tables. The Native App cannot see your data by default. You explicitly bind specific tables through the Snowflake UI, granting scoped read access without manual GRANT statements.

Service Functions

Bridge SQL and Cypher. Wrapper procedures handle CSV staging, data loading, query execution, and resource cleanup.

Data stays within your Snowflake account throughout this flow. No external network calls, no data exports.

Use Cases

These are the problem categories where graph queries provide a structural advantage over SQL:

Use Case: Fraud Detection

Uncover Fraud Rings Inside Snowflake

Your transaction data already lives in Snowflake. With FalkorDB, you can run Cypher graph queries directly on that data - no ETL, no data movement - to expose hidden fraud networks that SQL alone cannot see.

The Scenario: Data Already in Snowflake

A fintech company stores millions of transactions in Snowflake. Their compliance team suspects a fraud ring: multiple accounts sharing devices, funneling money through shell merchants, and cycling funds back to the same beneficiary. Traditional SQL queries with self-joins can catch simple cases, but multi-hop relationship patterns are invisible to relational queries.

TRANSACTIONS
ACCOUNTS
DEVICES
MERCHANTS

The Hidden Fraud Ring

Once loaded into FalkorDB, relationships between accounts, devices, and merchants reveal a cluster of accounts connected through shared devices and circular money flows. Click a node to see its connections highlighted, or drag a node to explore the graph layout.

Money flow
Shared device
Circular flow (fraud signal)

The Investigation: SQL vs. Cypher

Walk through three investigation steps. Each one shows why graph queries inside Snowflake find what SQL cannot.

"Which accounts are using the same physical device?"

Shared devices between unrelated accounts are a classic indicator of synthetic identity fraud or account takeover.

SQL Possible but fragile
SELECT a1.account_id, a2.account_id,
       d.device_fingerprint
FROM devices d
JOIN account_devices ad1
  ON d.device_id = ad1.device_id
JOIN account_devices ad2
  ON d.device_id = ad2.device_id
  AND ad1.account_id < ad2.account_id
JOIN accounts a1
  ON ad1.account_id = a1.account_id
JOIN accounts a2
  ON ad2.account_id = a2.account_id;

4 JOINs, 3 tables, and this only finds direct device sharing (1 hop).

Cypher Natural & readable
MATCH (a1:Account)
      -[:USES]->(d:Device)
      <-[:USES]-(a2:Account)
WHERE a1 <> a2
RETURN a1.name, a2.name,
       d.fingerprint

The query reads like the question. No JOINs, no bridging tables.

Account 1 Account 2 Device
Alice Bob iPhone-7X92
Charlie Dave Android-3K41

Insight: Two pairs of supposedly unrelated accounts share devices. But are they connected to each other? Step 2 reveals the money trail.

Graph Queries, Where Your Data Already Lives

FalkorDB brings Cypher, the most expressive graph query language, directly into Snowflake. No data movement, no external infrastructure. Your compliance team gets answers in seconds instead of days of SQL engineering.

Identity Resolution and Customer 360

Linking fragmented records across systems by shared attributes (email, phone, address) is a graph traversal problem:
				
					MATCH (c1:Customer)-[:HAS_EMAIL]->(e:Email)<-[:HAS_EMAIL]-(c2:Customer)
WHERE c1 <> c2
MERGE (c1)-[:SAME_AS]->(c2)
RETURN c1.source_system, c2.source_system, e.address

				
			

Infrastructure and Network Dependency Mapping

Impact analysis (“what breaks if this service goes down?”) is a directed traversal:

				
					MATCH (s:Service {name: 'auth-service'})<-[:DEPENDS_ON*1..5]-(downstream)
RETURN downstream.name, downstream.criticality
ORDER BY downstream.criticality DESC

				
			

Navigate to the Snowflake Marketplace, search for “FalkorDB,” and click **Get**. The installation creates the app with necessary permissions and roles.

Start the Service

-- Start service (creates compute pool and warehouse) CALL app_public.start_app('falkordb_pool', 'falkordb_wh'); -- Check service status CALL app_public.get_service_status();

Wait for status READY (typically 2–3 minutes).

Bind Your Data

  1. Navigate to Data Products > Apps > FalkorDB
  2. Go to Security > References
  3. Click + Add next to "Consumer Data Table"
  4. Select your database, schema, and table
  5. Click Save

FalkorDB can now access the bound table through Snowflake's permission model.

Load Data into a Graph

-- Load customer data (MERGE prevents duplicates on reload) CALL app_public.load_csv( 'customer_graph', 'LOAD CSV FROM ''file://consumer_data.csv'' AS row MERGE (c:Customer {id: row[0]}) ON CREATE SET c.name = row[1], c.email = row[2], c.city = row[3] ON MATCH SET c.name = row[1], c.email = row[2], c.city = row[3]' );

The procedure exports the bound table to CSV staging, passes it to the FalkorDB engine via service function, and cleans up temporary files. Access columns by index: row[0], row[1], etc. (0-indexed).

MERGE vs CREATE

Use MERGE to safely reload data without duplicates. MERGE matches existing nodes by key property (e.g., id) and updates them, or creates new ones if they don't exist. Use CREATE only for one-time bulk loads where duplicates are not a concern.

The procedure exports the bound table to CSV staging, passes it to the FalkorDB engine via service function, and cleans up temporary files. Access columns by index: `row[0]`, `row[1]`, etc. (0-indexed).

**MERGE vs CREATE:** Use `MERGE` to safely reload data without duplicates. MERGE matches existing nodes by key property (e.g., `id`) and updates them, or creates new ones if they don’t exist. Use `CREATE` only for one-time bulk loads where duplicates are not a concern.

Querying and Updates

Query the Graph

Run Cypher through Snowflake wrapper procedures, return structured result sets, and keep the graph current with incremental MERGE updates instead of full rebuilds.

Read Path

Run a Cypher query from SQL

graph_query
CALL app_public.graph_query('customer_graph',
  'MATCH (c:Customer {city: ''New York''})' ||
  ' RETURN c.name, c.email'
);

Graph queries return structured data that you can process further in Snowflake using standard SQL.

Structured output ready for joins, filters, and downstream SQL

Write Path

Apply incremental updates with MERGE

load_csv
CALL app_public.load_csv(
  'customer_graph',
  'LOAD CSV FROM ''file://consumer_data.csv'' AS row
   MERGE (c:Customer {id: row[0]})
   ON CREATE SET c.name = row[1], c.city = row[2], c.created = timestamp()
   ON MATCH SET c.name = row[1], c.city = row[2], c.updated = timestamp()'
);
Key

MERGE matches nodes by key property, id in this example.

Exists

If the node already exists, ON MATCH updates the changed fields.

Missing

If not, ON CREATE inserts a new node without duplicating prior graph state.

This supports continuous data pipelines without duplicates or full graph rebuilds.

Operations

Compute Pool Lifecycle and Cost Management

SPCS compute pools charge differently than warehouses. This becomes operationally important in production because idle pools keep billing until you suspend them explicitly.

The Key Difference

Warehouses auto-suspend. Compute pools do not.

Billing behavior
Behavior differences between Snowflake warehouses and SPCS compute pools
Behavior Snowflake Warehouse SPCS Compute Pool
Auto-suspend on idle Yes, configurable No
Billing when idle Stops after auto-suspend Continues until explicitly suspended
Resume Automatic on query Manual or via start_app
Operational takeaway

A warehouse naturally stops costing money after inactivity. A compute pool stays ACTIVE until you suspend it, even if nobody is querying FalkorDB.

Managing Pool Lifecycle

Check status, suspend to stop charges, resume when needed

SQL commands
-- Check current pool status
SHOW COMPUTE POOLS;

-- Suspend when not in use (stops charges)
ALTER COMPUTE POOL falkordb_pool SUSPEND;

-- Resume when needed
ALTER COMPUTE POOL falkordb_pool RESUME;
1

Use SHOW COMPUTE POOLS to confirm whether the pool is still active.

2

Suspend the pool after query work is finished so billing stops immediately.

3

Resume manually before use, or let your app workflow invoke start_app.

Technical Reference

Service Functions & Wrapper Procedures

Core endpoints for raw graph operations plus the wrappers that make ingestion and querying safer inside Snowflake.

Functions

Service Functions

Function Purpose
load_csv_raw() Graph loading endpoint
graph_query_raw() Cypher query execution
graph_list_raw() Enumerate graphs
graph_delete_raw() Graph removal
load_csv_raw()

Graph loading endpoint

graph_query_raw()

Cypher query execution

graph_list_raw()

Enumerate graphs

graph_delete_raw()

Graph removal

Procedures

Wrapper Procedures

Procedure Purpose
load_csv() JavaScript wrapper handling CSV export/import with automatic cleanup
graph_query() SQL wrapper for Cypher queries with error handling
load_csv()

JavaScript wrapper handling CSV export/import with automatic cleanup

graph_query()

SQL wrapper for Cypher queries with error handling

Why FalkorDB

FalkorDB’s engine is built on GraphBLAS, a linear algebra framework that represents graph operations as sparse matrix computations. This architectural choice matters because it maps graph traversals to optimized matrix operations rather than pointer-chasing, which improves cache behavior and computational throughput on modern hardware.

FalkorDB uses openCypher. If you have written Cypher queries before, the syntax is the same. No new query language to learn.

FalkorDB is open source. The core engine is on GitHub. The Snowflake Native App packages this engine for deployment within your Snowflake account.

Knowledge Base

Frequently Asked Questions