String Interning in Graph Databases: Save Memory, Boost Performance

Q: What’s the recommended approach for introducing string interning into an existing production graph?

First, run GRAPH.MEMORY USAGE to identify properties with high duplication (e.g., city, status). Then, apply intern() in batches to avoid transaction spikes: MATCH (n) WHERE n.city IS NOT NULL WITH n LIMIT 10000 SET n.city = intern(n.city);

Q: When is it best to apply intern() during data ingestion workflows?

Use intern() at insert-time for bulk loads, schema migrations, or ETL transformations where repeated strings are expected. Normalize values (e.g., toLower(), trim()) before interning to maximize deduplication. Example with CSV import: LOAD CSV FROM 'file://users.csv' AS row CREATE (u:User {name: intern(toLower(trim(row[1])))}); This ensures every identical string maps to the same memory reference from the moment it’s inserted.

Roi Lipman
Date Published: August 10, 2025
Date Updated: August 12, 2025

New York City, NYC?

In graph databases, efficiency and scalability are everything, especially when working with large, richly connected data. One of the lesser-known challenges is the memory consumed by storing identical string values over and over again. Think of common attributes like “city,” “category,” or “status” that might appear on thousands, or even millions, of nodes and edges. When every repeated value is stored as a separate string, memory usage can quickly spiral, putting pressure on resources and limiting how much data can be managed effectively.

A powerful solution to this problem is string interning: a method that can help developers deduplicate identical strings, and ensure that each unique value is stored just once, no matter how many times it appears. This optimization can lead to significant memory savings and can even accelerate certain query operations, making it an essential tool for anyone building or maintaining large-scale graph applications.

In this article, I cover how FalkorDB implements string interning, and how it can help you save memory if you work with large graphs.

Check out FalkorDB's full list of available Cypher functions

				
					CREATE (:A {v:intern('VERY LONG STRING')})

What is String Interning?

String interning is a memory optimization technique that ensures identical string values are stored only once in memory. Instead of keeping separate copies of the same text, like “New York” or “Active”, across thousands of nodes or edges, string interning creates a single, shared instance of each unique value.

How does it work? When the user adds a new string value to the database, they can use an intern() to specify that they want to leverage string interning (for example, a string like “Completed” or “In Transit”). The user can then continue to introduce similar deduplicated strings using the intern() function. As a result, every occurrence of, say, “Completed” or “In Transit” across the graph all point to the same place in memory.

This matters for graph databases, where certain attributes (like city names, product categories, or status labels) often repeat across a vast number of nodes and relationships. Without interning, these repeated values can quickly add up, wasting precious memory and potentially slowing down query performance. By deduplicating these strings, interning keeps your graph lean, efficient, and ready to scale.

Why Does It Matter in Graph Databases?

Graph databases are designed to represent complex relationships and connections, often at massive scale. As your data grows, so does the chance that many nodes and edges will share the same attribute values. For example, in a social FOAF (friend-of-a-friend) graph, where millions of user profiles include a “gender” field, the string will be repeated millions of times. Or, consider a knowledge graph organizing information with standardized labels such as “type”, “category”, or “status”, which may be reused countless times across different entities and relationships.

In these scenarios, graph databases store every repeated string as a separate object in memory. This means that if 100,000 people in your graph belong to “female” gender, there could be 100,000 separate “female” strings taking up space, rather than a single instance being shared. The same applies to product codes, job titles, genres, or any other commonly repeated attribute.

If you could deduplicate these repeated values through string interning, you could substantially reduce the memory footprint of your graph, and free up resources for larger or more complex datasets.

How String Interning Works in FalkorDB

String interning is now a fully integrated feature in FalkorDB v4.10, offering deduplication of repeated string values across your graph. This enhancement is especially impactful for datasets with high attribute redundancy, delivering improved memory efficiency and performance..

From version 4.10 onward, FalkorDB can help you deduplicate strings through a simple intern() function call. When you use this, FalkorDB reuses that shared instance in memory; no extra steps or query modifications required. This process is entirely transparent and seamless.

This feature is particularly helpful for large-scale data ingestion. If you bulk-import user profiles and thousands share the same values for properties like “city” or “status”, FalkorDB ensures only one copy of each distinct string is stored in memory, significantly reducing redundancy.

Here’s how you can use it:

				
					MATCH (u:User)
SET u.city = intern(u.city)

As documented, the intern() function “deduplicates the input string by storing a single internal copy across the database.”

Notably, you can use interned strings just like regular strings in Cypher queries: you can use them freely in matches, filters, traversals, and aggregations, but with the added benefit of reduced memory usage and faster reference equality comparisons.

The intern() String Function: Practical Guidance

The intern() function gives you explicit control to optimize memory usage or cleanse legacy data. Here’s how to maximize its benefits:

Syntax and Usage in Cypher

The intern() function takes a single string argument and returns a deduplicated, interned version that points to a shared memory buffer:

				
					MATCH (n:User)
SET n.name = intern(n.name)

The intern() function deduplicates the input string by storing a single internal copy across the database. As a result, all identical “name” values will share the same string instance in memory.

When to Use intern()

Bulk Imports: When loading large data sets (e.g., CSV of user profiles), you can use intern() during the load query on frequently repeated fields to consolidate memory usage.
Schema Migrations: When renaming or merging properties, applying intern() helps avoid reintroducing duplicate values. Essentially, the copy of an interned string is an interned string.
ETL Pipelines and Data Cleaning: Pair intern() with transformations (toLower(), trimming, etc.) to ensure that minor variations don’t fragment memory representations.

For example, if you want to use intern() function during LOAD CSV, here’s how you can do it:

				
					LOAD CSV FROM 'file://users.csv' AS row
CREATE (u:User {name: intern(row[1])})

Handling Edge Cases

Exact-match Deduplication: Interning only matches exactly identical strings. Values like “New york” vs. “new york” or “New York ” remain distinct unless you normalize them prior to interning.
Large Transactions: Interning across millions of nodes in one go can momentarily increase transaction load. For best results, batch updates to reduce memory spikes.
Redundancy with Automatic Interning: Overshadowing the system’s built-in deduplication is usually unnecessary. Apply intern() only in targeted scenarios where you need a take-charge approach.

Below are Figures 1 & 2: Demonstrating the intern() string function in FalkorDB:

Performance and Memory Benefits

Let’s now discuss the performance and memory benefits of string interning and what you can expect.

Reduction in Memory Usage

Previously, every duplicate string value, such as city names or status labels, would occupy its own memory slot in the database, leading to significant overhead when the same value appeared across thousands or millions of nodes and relationships. String interning consolidates these duplicates so that only one instance of each unique string is stored, regardless of how often it’s used.

You can quantify this improvement directly with the GRAPH.MEMORY command, which provides a detailed breakdown of memory consumption before and after interning.

				
					// Check memory usage before interning
GRAPH.MEMORY USAGE my_graph

// Perform interning on a commonly repeated property
MATCH (n) SET n.city = intern(n.city);

// Check memory usage after interning
GRAPH.MEMORY USAGE my_graph

This will show you how many bytes have been reclaimed through deduplication.

Impact on Query Performance

Beyond memory reduction, string interning also speeds up equality comparisons during query execution. Because the database can compare string references directly (rather than comparing every character), filters, aggregations, and traversals involving interned string properties become more efficient. This is particularly valuable in workloads that filter or group by common string attributes, as well as when executing analytics or real-time recommendations.

Visual Impact

A simple chart or table can help illustrate the benefit (which you can track the change over time using the GRAPH.MEMORY USAGE output.)

Attribute	Before Interning	After Interning
City: “New York”	500,000 strings	1 string

Code Demo & Benchmark

To illustrate the real-world impact of string interning in FalkorDB, let’s walk through a simple experiment comparing query performance with and without interning. We’ll set up a small test graph, measure performance, and highlight the difference.

1. Start FalkorDB with Docker

First, spin up a fresh FalkorDB instance using Docker:

				
					docker run -d \
  --name falkordb-instance \
  -p 6379:6379 \
  -p 3000:3000 \
  falkordb/falkordb:latest

2. Import Libraries and Initialize the Client

Next, import the required libraries and initialize the Python client:

				
					import time
from falkordb import FalkorDB


db = FalkorDB(host='localhost', port=6379)

3. Create and Prepare the Graph

We’ll create a new graph called intern_demo. To ensure a clean slate, remove any existing nodes:

				
					graph = db.select_graph('intern_demo')

We’ll use a shared string for the interning test:

				
					SHARED_STRING = "critical_vulnerability" # Shared string for interning

4. Insert Nodes With and Without Interning

Now, let’s define a function to create nodes. This allows us to toggle between using the intern() scalar function and not using it:

				
					def create_nodes(use_intern, string_value, count=100):
    for i in range(count):
        if use_intern:
            query = "CREATE (n:Vulnerability {id: $id, type: intern($type)})"
        else:
            query = "CREATE (n:Vulnerability {id: $id, type: $type})"
        graph.query(query, {'id': i, 'type': string_value})

5. Query the Graph and Measure Performance

We’ll run a simple query to count nodes by type, measuring the execution time for both approaches. We’ll run both tests and compare results:

				
					def run_query(use_intern):
    start = time.time()
    if use_intern:
        query = "MATCH (n:Vulnerability) WHERE n.type = intern($type) RETURN count(n)"
        result = graph.query(query, {'type': SHARED_STRING})
    else:
        query = "MATCH (n:Vulnerability) WHERE n.type = $type RETURN count(n)"
        result = graph.query(query, {'type': SHARED_STRING})
    return time.time() - start, result.result_set[0][0]


# Test 1: With string interning
create_nodes(True, SHARED_STRING, 10000)
query_time, count = run_query(True)
print(f"Nodes found: {count}, Query time: {query_time:.6f} seconds")


# Clear and reset for next test




# Clear and reset for next test
graph.delete()
graph = db.select_graph('intern_demo')


# Test 2: Without string interning
create_nodes(False, SHARED_STRING, 10000)
query_time, count = run_query(False)
print(f"Nodes found: {count}, Query time: {query_time:.6f} seconds")

				
					Nodes found: 10000, Query time: 0.003378 seconds
Nodes found: 10000, Query time: 0.255518 seconds

As anticipated, queries run faster when string interning is used, thanks to deduplication and more efficient string comparisons.

Migration and Best Practices

Introducing string interning into production graphs requires thoughtful planning, both for existing data and new projects alike. Here’s how to approach this efficiently.

Introducing Interning to Existing Graphs

Analyze current memory usage

Begin by running:

GRAPH.MEMORY USAGE my_graph

This command breaks down memory usage by categories (nodes, edges, string storage), helping you identify high-duplication properties.

Target high-duplication fields

Common candidates include properties like city, status, and category. Focus on those with substantial repetition.

Run interning in batches

Apply string interning selectively using:

MATCH (n)

WHERE n.city IS NOT NULL

WITH n LIMIT 10000

SET n.city = intern(n.city);

Repeat these batches to minimize transaction load and avoid long-running queries.

Reassess memory usage

After interning, rerun GRAPH.MEMORY USAGE to measure reduced overhead. Aim for memory savings in line with expected duplication reduction.

Best Practices for New Projects

Design with interning in mind
Early awareness can help you normalize frequently repeated string fields (e.g., country, category) and optionally apply intern() during insert-time.
Normalize string formats
To maximize deduplication, cleanse strings (e.g., use toLower() and trim()) before interning, ensuring consistent matching.
Monitor memory over time
Regularly schedule GRAPH.MEMORY USAGE to catch new duplication issues early and assess whether periodic intern() runs are needed.
Batch large updates
When applying intern() across massive datasets, segment updates into batches (e.g., 10k–50k nodes) to avoid memory pressure or long-running transactions.

Real-world Applications

We will now discuss real-world applications where string interning brings noticeable benefits, especially in scenarios with heavy string reuse.

Cybersecurity

One of the most compelling use cases for string interning is in cybersecurity. If you’re managing complex security data, you know that graph databases are a natural fit: they let you integrate data from diverse sources, work with massive datasets, and uncover dependencies that traditional tools often overlook. By modeling a digital twin of your entire IT infrastructure as a graph, you can analyze relationships and quickly identify potential attack paths that cross multiple systems. For example, with queries like:

				
					MATCH (user:User) WHERE NOT (user)-[:IN_GROUP]->(group:SecurityGroup)
RETURN user

MATCH (sw:InstalledSoftware) WHERE NOT (sw)<-[:ALLOWED_BY_POLICY]-(p:Policy)
RETURN sw

… you can pinpoint users missing from security groups or software not protected by policy, all in real time.

In these scenarios, string interning helps you reduce memory usage dramatically by deduplicating repeated values like usernames, group names, or policy identifiers. This keeps your security graphs lean, fast, and ready for real-time analysis as your data grows.

Cloud Infrastructure and Security Management

You’ll see similar benefits in Cloud Infrastructure Entitlement Management (CIEM). If you’re tracking identities, roles, and permissions across your cloud environments, your graphs are probably full of repeated role names, permission types, and resource IDs. With string interning, each unique value is stored just once, so you can scale up without worrying about wasted memory or slow traversals.

Cloud Security Posture Management (CSPM) platforms are another area where string interning pays off. CSPM tools rely on graph databases to monitor cloud configurations for vulnerabilities and compliance. Here, configuration values, service names, and status labels show up over and over again. String interning keeps your graph database efficient, letting you analyze relationships across multiple clouds at speed. Whether you’re storing IP addresses, domain names, user IDs, or event categories, string interning gives you a scalable way to manage the repetitive data that’s everywhere in security-focused graphs.

Social & Knowledge Graphs

In our experience supporting large-scale knowledge graph deployments like VCPedia (venture capital data extraction) and Fractal KG (self-organizing graph structures), we’ve observed how repetitive string properties such as “country,” “category,” and “type” can rapidly inflate memory usage. Before string interning, each instance of a common value, like “United States”, would be stored separately across millions of nodes and edges, driving up hardware costs and impacting performance.

With string interning, we deduplicate these values automatically or using the intern() function. This change has delivered memory savings of 30%–50%.

Supply Chain & Product Catalogs

In supply chain and product catalog graphs, repeated attributes like product_code, SKU, and vendor_name can appear across millions of nodes and relationships. Previously, each repeated value would consume additional memory, making it challenging to scale as catalogs or supplier networks grew.

With string interning in FalkorDB, identifiers such as “SKU-12345” or “Acme Supplies” are stored only once, regardless of how often they appear. This simple change has cut memory usage by up to 50% in our internal benchmarks, enabling organizations to efficiently model larger, more complex supply networks and deliver insights faster—all while keeping infrastructure costs low.

Conclusion

String interning represents a powerful yet seamless optimization for anyone working with large graph datasets. This feature is designed to work in the background for everyday operations, while also giving you precise control for bulk updates, migrations, or data cleaning tasks.

We encourage you to explore string interning in your own projects and see the impact firsthand.

Ready to get started?

Check out our documentation, visit our GitHub repository, or join the conversation on our community forums. We’d love to hear about your use cases, results, and ideas!

FAQ

Editor’s note: The following FAQ has been updated to reflect current trends and best-practices in August 2025.

How does string interning in FalkorDB improve both memory usage and query performance?

String interning deduplicates repeated string values so each unique value is stored only once in memory. This can cut memory usage by 20%–60% on graphs with high attribute repetition. Query performance improves because equality checks on interned strings compare memory references instead of character sequences, reducing the cost of filters, aggregations, and joins on repeated attributes.

What’s the recommended approach for introducing string interning into an existing production graph?

First, run GRAPH.MEMORY USAGE to identify properties with high duplication (e.g., city, status). Then, apply intern() in batches to avoid transaction spikes:

				
					MATCH (n)
WHERE n.city IS NOT NULL
WITH n LIMIT 10000
SET n.city = intern(n.city);

When is it best to apply intern() during data ingestion workflows?

Use intern() at insert-time for bulk loads, schema migrations, or ETL transformations where repeated strings are expected. Normalize values (e.g., toLower(), trim()) before interning to maximize deduplication. Example with CSV import:

				
					LOAD CSV FROM 'file://users.csv' AS row
CREATE (u:User {name: intern(toLower(trim(row[1])))});

This ensures every identical string maps to the same memory reference from the moment it’s inserted.

Citations

FalkorDB Documentation — String Interning. FalkorDB v4.10 Release Notes. Available at: https://github.com/FalkorDB/FalkorDB/pull/1095
FalkorDB Documentation — GRAPH.MEMORY USAGE Command. Available at: https://falkordb.com/docs/commands/graph.memory-usage/
FalkorDB Documentation — Cypher intern() Function Reference. Available at: https://falkordb.com/docs/cypher/functions/string/#intern
FalkorDB Docker Image. Available at: https://hub.docker.com/r/falkordb/falkordb
FalkorDB Python Client. Available at: https://pypi.org/project/falkordb/

References

FalkorDB. FalkorDB v4.10 Release Notes. GitHub. https://github.com/FalkorDB/FalkorDB/pull/1095
FalkorDB. GRAPH.MEMORY USAGE Command Documentation. https://falkordb.com/docs/commands/graph.memory-usage/
FalkorDB. Cypher String Functions — intern(). https://falkordb.com/docs/cypher/functions/string/#intern
FalkorDB. FalkorDB Docker Image. Docker Hub. https://hub.docker.com/r/falkordb/falkordb
FalkorDB. FalkorDB Python Client. PyPI. https://pypi.org/project/falkordb/

Roi Lipman

Roi Lipman serves as CTO at FalkorDB, leading the development of ultra-low-latency graph database platforms for generative AI and retrieval-augmented generation (RAG) workflows. He brings over 20 years of database engineering expertise from roles at Forter, StreamRail, Maglan, AVG and the Israel Intelligence Corps. As creator and lead architect of RedisGraph for the past eight years, he optimized Cypher-based knowledge graph performance for enterprise-scale AI applications.