Frequently Asked Questions

String Interning & Memory Optimization

What is string interning in FalkorDB and why is it important?

String interning in FalkorDB is a memory optimization technique that ensures identical string values are stored only once in memory. This is crucial for graph databases, where attributes like "city" or "status" may repeat across millions of nodes and edges. By deduplicating these values, string interning significantly reduces memory usage and improves query performance, especially in large-scale graphs. Source

How does string interning improve memory usage in FalkorDB?

String interning consolidates duplicate string values so that only one instance of each unique string is stored, regardless of how many times it appears. This can reduce memory usage by 20%–60% on graphs with high attribute repetition. For example, if 500,000 nodes share the city "New York," only one string is stored instead of 500,000. Source

How does string interning affect query performance in FalkorDB?

With string interning, equality checks in queries compare memory references instead of character sequences, making filters, aggregations, and joins on repeated attributes much faster. In benchmark tests, queries on interned strings ran up to 75x faster than on non-interned strings. Source

How do I use the intern() function in FalkorDB?

The intern() function in Cypher takes a string argument and returns a deduplicated, interned version. For example: MATCH (n:User) SET n.name = intern(n.name). This ensures all identical "name" values share the same memory instance. Source

When should I apply string interning during data ingestion?

Apply intern() during bulk imports, schema migrations, or ETL transformations where repeated strings are expected. Normalize values (e.g., toLower(), trim()) before interning to maximize deduplication. Example: LOAD CSV FROM 'file://users.csv' AS row CREATE (u:User {name: intern(toLower(trim(row[1])))}).

How can I measure the memory savings from string interning?

Use the GRAPH.MEMORY USAGE command before and after applying intern() to see how many bytes have been reclaimed. This command provides a detailed breakdown of memory consumption by category. Source

What are best practices for introducing string interning to an existing graph?

First, analyze current memory usage with GRAPH.MEMORY USAGE. Target high-duplication fields (e.g., city, status), and apply intern() in batches (e.g., 10,000 nodes at a time) to avoid transaction spikes. Reassess memory usage after each batch to track savings. Source

Does string interning affect how I write Cypher queries in FalkorDB?

No, interned strings can be used just like regular strings in Cypher queries. You can use them in matches, filters, traversals, and aggregations, but with the added benefit of reduced memory usage and faster reference equality comparisons. Source

What edge cases should I consider when using string interning?

Interning only matches exactly identical strings. Values like "New york" vs. "new york" or "New York " remain distinct unless you normalize them first. For large transactions, batch updates to avoid memory spikes. Source

How does string interning help in cybersecurity and cloud security graphs?

In cybersecurity and cloud security graphs, string interning deduplicates repeated values like usernames, group names, or policy identifiers. This reduces memory usage and enables real-time analysis of large, complex datasets, making it easier to identify attack paths and compliance gaps. Source

What are the benefits of string interning for supply chain and product catalog graphs?

String interning in supply chain and product catalog graphs ensures that repeated attributes like product codes, SKUs, and vendor names are stored only once. This can cut memory usage by up to 50%, enabling organizations to efficiently model larger, more complex supply networks. Source

How does string interning impact social and knowledge graphs?

In social and knowledge graphs, repetitive properties like "country," "category," and "type" can rapidly inflate memory usage. String interning deduplicates these values, delivering memory savings of 30%–50% in large deployments. Source

Is string interning available in all versions of FalkorDB?

String interning is fully integrated in FalkorDB v4.10 and later. Users of earlier versions should upgrade to take advantage of this feature. Source

Can I use string interning with Docker deployments of FalkorDB?

Yes, you can use string interning with FalkorDB Docker deployments. Simply run the latest FalkorDB image from Docker Hub to access all features, including string interning. Source

Where can I find more documentation on string interning in FalkorDB?

Comprehensive documentation on string interning and the intern() function is available at docs.falkordb.com.

How does FalkorDB's string interning compare to other graph databases?

FalkorDB's string interning is designed for high efficiency and seamless integration, offering significant memory and performance benefits. Compared to competitors like Neo4j, FalkorDB provides up to 6x better memory efficiency and up to 496x faster latency for certain workloads. Source

What is the business impact of using string interning in FalkorDB?

Organizations using string interning in FalkorDB can expect lower infrastructure costs, faster query performance, and the ability to scale to larger datasets. This leads to improved operational efficiency and faster time-to-insight for analytics and AI applications. Source

Can string interning be used for legacy data in FalkorDB?

Yes, you can apply intern() to legacy data during schema migrations or data cleaning. This helps deduplicate repeated values and optimize memory usage in existing graphs. Source

How does FalkorDB support large-scale, high-dimensional data?

FalkorDB supports over 10,000 multi-graphs, flexible horizontal scaling, and advanced memory optimization features like string interning. This makes it ideal for enterprises and SaaS providers managing complex, high-dimensional datasets. Source

What are the main use cases for string interning in FalkorDB?

Main use cases include cybersecurity graphs, cloud infrastructure management, social and knowledge graphs, and supply chain/product catalog graphs—any scenario with heavy reuse of string attributes. Source

How does FalkorDB's string interning feature help with regulatory compliance?

By reducing memory usage and improving performance, string interning enables organizations to efficiently map regulations to workflows and identify compliance gaps in large, complex datasets. This is especially useful for financial and security compliance applications. Source

What customer success stories demonstrate the benefits of FalkorDB's memory optimization?

AdaptX used FalkorDB to analyze high-dimensional clinical data, uncovering hidden insights and improving recommendations. XR.Voyage overcame scalability challenges in immersive media, and Virtuous AI built a high-performance, multi-modal data store for ethical AI development. Case Studies

How does FalkorDB ensure security and compliance for enterprise deployments?

FalkorDB is SOC 2 Type II compliant, ensuring rigorous standards for security, availability, processing integrity, confidentiality, and privacy. This makes it suitable for regulated industries and enterprise-grade deployments. Source

What integrations does FalkorDB support for advanced analytics and AI?

FalkorDB integrates with frameworks like Graphiti, g.v(), Cognee, LangChain, and LlamaIndex for AI agent memory, knowledge graph visualization, and LLM integration. Source

How quickly can I implement FalkorDB and start using string interning?

FalkorDB is built for rapid deployment, allowing teams to go from concept to enterprise-grade solutions in weeks, not months. You can start immediately with cloud sign-up, Docker deployment, or by following the official documentation. Source

What support resources are available for FalkorDB users?

FalkorDB offers comprehensive documentation, community support via Discord and GitHub Discussions, solution architects for tailored advice, and free trial/demo options for onboarding. Source

How does FalkorDB's open-source model benefit users?

FalkorDB's open-source licensing encourages community collaboration and transparency, allowing users to contribute, audit, and extend the platform. This contrasts with proprietary solutions and supports innovation. Source

What are the pricing options for FalkorDB?

FalkorDB offers a FREE plan for MVPs, a STARTUP plan starting from /1GB/month, a PRO plan from 0/8GB/month, and an ENTERPRISE plan with tailored pricing and advanced features. Source

Who are FalkorDB's main competitors and how does it compare?

FalkorDB's main competitors include Neo4j, AWS Neptune, TigerGraph, and ArangoDB. FalkorDB stands out with up to 496x faster latency, 6x better memory efficiency, built-in multi-tenancy, and open-source licensing. Source

What roles and industries benefit most from FalkorDB?

FalkorDB is designed for developers, data scientists, engineers, and security analysts in enterprises, SaaS providers, healthcare, media, and AI/ethical AI development. Source

String Interning in Graph Databases: Save Memory, Boost Performance

String Interning at FalkorDB

New York City, NYC?

In graph databases, efficiency and scalability are everything, especially when working with large, richly connected data. One of the lesser-known challenges is the memory consumed by storing identical string values over and over again. Think of common attributes like “city,” “category,” or “status” that might appear on thousands, or even millions, of nodes and edges. When every repeated value is stored as a separate string, memory usage can quickly spiral, putting pressure on resources and limiting how much data can be managed effectively.

A powerful solution to this problem is string interning: a method that can help developers deduplicate identical strings, and ensure that each unique value is stored just once, no matter how many times it appears. This optimization can lead to significant memory savings and can even accelerate certain query operations, making it an essential tool for anyone building or maintaining large-scale graph applications.

In this article, I cover how FalkorDB implements string interning, and how it can help you save memory if you work with large graphs.

Check out FalkorDB's full list of available Cypher functions

				
					CREATE (:A {v:intern('VERY LONG STRING')})

				
			

What is String Interning?

String interning is a memory optimization technique that ensures identical string values are stored only once in memory. Instead of keeping separate copies of the same text, like “New York” or “Active”, across thousands of nodes or edges, string interning creates a single, shared instance of each unique value.

How does it work? When the user adds a new string value to the database, they can use an intern() to specify that they want to leverage string interning (for example, a string like “Completed” or “In Transit”). The user can then continue to introduce similar deduplicated strings using the intern() function. As a result, every occurrence of, say, “Completed” or “In Transit” across the graph all point to the same place in memory.

This matters for graph databases, where certain attributes (like city names, product categories, or status labels) often repeat across a vast number of nodes and relationships. Without interning, these repeated values can quickly add up, wasting precious memory and potentially slowing down query performance. By deduplicating these strings, interning keeps your graph lean, efficient, and ready to scale.

falkordb-graph-visualization

Why Does It Matter in Graph Databases?

Graph databases are designed to represent complex relationships and connections, often at massive scale. As your data grows, so does the chance that many nodes and edges will share the same attribute values. For example, in a social FOAF (friend-of-a-friend) graph, where millions of user profiles include a “gender” field, the string will be repeated millions of times. Or, consider a knowledge graph organizing information with standardized labels such as “type”, “category”, or “status”, which may be reused countless times across different entities and relationships.

In these scenarios, graph databases store every repeated string as a separate object in memory. This means that if 100,000 people in your graph belong to “female” gender, there could be 100,000 separate “female” strings taking up space, rather than a single instance being shared. The same applies to product codes, job titles, genres, or any other commonly repeated attribute.

If you could deduplicate these repeated values through string interning, you could substantially reduce the memory footprint of your graph, and free up resources for larger or more complex datasets.

How String Interning Works in FalkorDB

String interning is now a fully integrated feature in FalkorDB v4.10, offering deduplication of repeated string values across your graph. This enhancement is especially impactful for datasets with high attribute redundancy, delivering improved memory efficiency and performance..

From version 4.10 onward, FalkorDB can help you deduplicate strings through a simple intern() function call. When you use this, FalkorDB reuses that shared instance in memory; no extra steps or query modifications required. This process is entirely transparent and seamless.

This feature is particularly helpful for large-scale data ingestion. If you bulk-import user profiles and thousands share the same values for properties like “city” or “status”, FalkorDB ensures only one copy of each distinct string is stored in memory, significantly reducing redundancy.

Here’s how you can use it:

				
					MATCH (u:User)
SET u.city = intern(u.city)

				
			

As documented, the intern() function “deduplicates the input string by storing a single internal copy across the database.” 

Notably, you can use interned strings just like regular strings in Cypher queries: you can use them freely in matches, filters, traversals, and aggregations, but with the added benefit of reduced memory usage and faster reference equality comparisons.

The intern() String Function: Practical Guidance

The intern() function gives you explicit control to optimize memory usage or cleanse legacy data. Here’s how to maximize its benefits:

Syntax and Usage in Cypher

The intern() function takes a single string argument and returns a deduplicated, interned version that points to a shared memory buffer:

				
					MATCH (n:User)
SET n.name = intern(n.name)

				
			

The intern() function deduplicates the input string by storing a single internal copy across the database.  As a result, all identical “name” values will share the same string instance in memory.

When to Use intern()

  • Bulk Imports: When loading large data sets (e.g., CSV of user profiles), you can use intern() during the load query on frequently repeated fields to consolidate memory usage.
  • Schema Migrations: When renaming or merging properties, applying intern() helps avoid reintroducing duplicate values. Essentially, the copy of an interned string is an interned string.
  • ETL Pipelines and Data Cleaning: Pair intern() with transformations (toLower(), trimming, etc.) to ensure that minor variations don’t fragment memory representations.

 

For example, if you want to use intern() function during LOAD CSV, here’s how you can do it:

				
					LOAD CSV FROM 'file://users.csv' AS row
CREATE (u:User {name: intern(row[1])}) 
				
			

Handling Edge Cases

  • Exact-match Deduplication: Interning only matches exactly identical strings. Values like “New york” vs. “new york” or “New York ” remain distinct unless you normalize them prior to interning.
  • Large Transactions: Interning across millions of nodes in one go can momentarily increase transaction load. For best results, batch updates to reduce memory spikes.
  • Redundancy with Automatic Interning: Overshadowing the system’s built-in deduplication is usually unnecessary. Apply intern() only in targeted scenarios where you need a take-charge approach.

 

Below are Figures 1 & 2: Demonstrating the intern() string function in FalkorDB:

string interning at FalkorDB intern() string function diagram
string interning at FalkorDB intern string function diagram2 FalkorDB

Performance and Memory Benefits

Let’s now discuss the performance and memory benefits of string interning and what you can expect.

Reduction in Memory Usage

Previously, every duplicate string value, such as city names or status labels, would occupy its own memory slot in the database, leading to significant overhead when the same value appeared across thousands or millions of nodes and relationships. String interning consolidates these duplicates so that only one instance of each unique string is stored, regardless of how often it’s used.

You can quantify this improvement directly with the GRAPH.MEMORY command, which provides a detailed breakdown of memory consumption before and after interning. 

				
					// Check memory usage before interning
GRAPH.MEMORY USAGE my_graph

// Perform interning on a commonly repeated property
MATCH (n) SET n.city = intern(n.city);

// Check memory usage after interning
GRAPH.MEMORY USAGE my_graph

				
			

This will show you how many bytes have been reclaimed through deduplication.

Impact on Query Performance

Beyond memory reduction, string interning also speeds up equality comparisons during query execution. Because the database can compare string references directly (rather than comparing every character), filters, aggregations, and traversals involving interned string properties become more efficient. This is particularly valuable in workloads that filter or group by common string attributes, as well as when executing analytics or real-time recommendations.

Visual Impact

A simple chart or table can help illustrate the benefit (which you can track the change over time using the GRAPH.MEMORY USAGE output.)

Attribute Before Interning After Interning
City: “New York” 500,000 strings 1 string

Code Demo & Benchmark

To illustrate the real-world impact of string interning in FalkorDB, let’s walk through a simple experiment comparing query performance with and without interning. We’ll set up a small test graph, measure performance, and highlight the difference.

1. Start FalkorDB with Docker

First, spin up a fresh FalkorDB instance using Docker:

				
					docker run -d \
  --name falkordb-instance \
  -p 6379:6379 \
  -p 3000:3000 \
  falkordb/falkordb:latest
				
			

2. Import Libraries and Initialize the Client

Next, import the required libraries and initialize the Python client:

				
					import time
from falkordb import FalkorDB


db = FalkorDB(host='localhost', port=6379)
				
			

3. Create and Prepare the Graph

We’ll create a new graph called intern_demo. To ensure a clean slate, remove any existing nodes:

				
					graph = db.select_graph('intern_demo')

				
			

We’ll use a shared string for the interning test:

				
					SHARED_STRING = "critical_vulnerability" # Shared string for interning

				
			

4. Insert Nodes With and Without Interning

Now, let’s define a function to create nodes. This allows us to toggle between using the intern() scalar function and not using it:

				
					def create_nodes(use_intern, string_value, count=100):
    for i in range(count):
        if use_intern:
            query = "CREATE (n:Vulnerability {id: $id, type: intern($type)})"
        else:
            query = "CREATE (n:Vulnerability {id: $id, type: $type})"
        graph.query(query, {'id': i, 'type': string_value})
				
			

5. Query the Graph and Measure Performance

We’ll run a simple query to count nodes by type, measuring the execution time for both approaches. We’ll run both tests and compare results:

				
					def run_query(use_intern):
    start = time.time()
    if use_intern:
        query = "MATCH (n:Vulnerability) WHERE n.type = intern($type) RETURN count(n)"
        result = graph.query(query, {'type': SHARED_STRING})
    else:
        query = "MATCH (n:Vulnerability) WHERE n.type = $type RETURN count(n)"
        result = graph.query(query, {'type': SHARED_STRING})
    return time.time() - start, result.result_set[0][0]


# Test 1: With string interning
create_nodes(True, SHARED_STRING, 10000)
query_time, count = run_query(True)
print(f"Nodes found: {count}, Query time: {query_time:.6f} seconds")


# Clear and reset for next test




# Clear and reset for next test
graph.delete()
graph = db.select_graph('intern_demo')


# Test 2: Without string interning
create_nodes(False, SHARED_STRING, 10000)
query_time, count = run_query(False)
print(f"Nodes found: {count}, Query time: {query_time:.6f} seconds")
				
			
				
					Nodes found: 10000, Query time: 0.003378 seconds
Nodes found: 10000, Query time: 0.255518 seconds

				
			

As anticipated, queries run faster when string interning is used, thanks to deduplication and more efficient string comparisons.

Migration and Best Practices

Introducing string interning into production graphs requires thoughtful planning, both for existing data and new projects alike. Here’s how to approach this efficiently.

Introducing Interning to Existing Graphs

Analyze current memory usage

Begin by running:

GRAPH.MEMORY USAGE my_graph

This command breaks down memory usage by categories (nodes, edges, string storage), helping you identify high-duplication properties.

Target high-duplication fields

Common candidates include properties like city, status, and category. Focus on those with substantial repetition.

Run interning in batches

Apply string interning selectively using:

MATCH (n)

WHERE n.city IS NOT NULL

WITH n LIMIT 10000

SET n.city = intern(n.city);

Repeat these batches to minimize transaction load and avoid long-running queries.

Reassess memory usage

After interning, rerun GRAPH.MEMORY USAGE to measure reduced overhead. Aim for memory savings in line with expected duplication reduction.

Best Practices for New Projects

  1. Design with interning in mind
    Early awareness can help you normalize frequently repeated string fields (e.g., country, category) and optionally apply intern() during insert-time.
  2. Normalize string formats
    To maximize deduplication, cleanse strings (e.g., use toLower() and trim()) before interning, ensuring consistent matching.
  3. Monitor memory over time
    Regularly schedule GRAPH.MEMORY USAGE to catch new duplication issues early and assess whether periodic intern() runs are needed.
  4. Batch large updates
    When applying intern() across massive datasets, segment updates into batches (e.g., 10k–50k nodes) to avoid memory pressure or long-running transactions.

Real-world Applications

We will now discuss real-world applications where string interning brings noticeable benefits, especially in scenarios with heavy string reuse.

Cybersecurity

One of the most compelling use cases for string interning is in cybersecurity. If you’re managing complex security data, you know that graph databases are a natural fit: they let you integrate data from diverse sources, work with massive datasets, and uncover dependencies that traditional tools often overlook. By modeling a digital twin of your entire IT infrastructure as a graph, you can analyze relationships and quickly identify potential attack paths that cross multiple systems. For example, with queries like:

				
					MATCH (user:User) WHERE NOT (user)-[:IN_GROUP]->(group:SecurityGroup)
RETURN user

MATCH (sw:InstalledSoftware) WHERE NOT (sw)<-[:ALLOWED_BY_POLICY]-(p:Policy)
RETURN sw

				
			

… you can pinpoint users missing from security groups or software not protected by policy, all in real time.

In these scenarios, string interning helps you reduce memory usage dramatically by deduplicating repeated values like usernames, group names, or policy identifiers. This keeps your security graphs lean, fast, and ready for real-time analysis as your data grows.

Cloud Infrastructure and Security Management

You’ll see similar benefits in Cloud Infrastructure Entitlement Management (CIEM). If you’re tracking identities, roles, and permissions across your cloud environments, your graphs are probably full of repeated role names, permission types, and resource IDs. With string interning, each unique value is stored just once, so you can scale up without worrying about wasted memory or slow traversals.

Cloud Security Posture Management (CSPM) platforms are another area where string interning pays off. CSPM tools rely on graph databases to monitor cloud configurations for vulnerabilities and compliance. Here, configuration values, service names, and status labels show up over and over again. String interning keeps your graph database efficient, letting you analyze relationships across multiple clouds at speed. Whether you’re storing IP addresses, domain names, user IDs, or event categories, string interning gives you a scalable way to manage the repetitive data that’s everywhere in security-focused graphs.

Social & Knowledge Graphs

In our experience supporting large-scale knowledge graph deployments like VCPedia (venture capital data extraction) and Fractal KG (self-organizing graph structures), we’ve observed how repetitive string properties such as “country,” “category,” and “type” can rapidly inflate memory usage. Before string interning, each instance of a common value, like “United States”, would be stored separately across millions of nodes and edges, driving up hardware costs and impacting performance.

With string interning, we deduplicate these values automatically or using the intern() function. This change has delivered memory savings of 30%–50%.

Supply Chain & Product Catalogs

In supply chain and product catalog graphs, repeated attributes like product_code, SKU, and vendor_name can appear across millions of nodes and relationships. Previously, each repeated value would consume additional memory, making it challenging to scale as catalogs or supplier networks grew.

With string interning in FalkorDB, identifiers such as “SKU-12345” or “Acme Supplies” are stored only once, regardless of how often they appear. This simple change has cut memory usage by up to 50% in our internal benchmarks, enabling organizations to efficiently model larger, more complex supply networks and deliver insights faster—all while keeping infrastructure costs low.

Conclusion

String interning represents a powerful yet seamless optimization for anyone working with large graph datasets. This feature is designed to work in the background for everyday operations, while also giving you precise control for bulk updates, migrations, or data cleaning tasks. 

We encourage you to explore string interning in your own projects and see the impact firsthand. 

Ready to get started?

Check out our documentation, visit our GitHub repository, or join the conversation on our community forums. We’d love to hear about your use cases, results, and ideas!

FAQ

Editor’s note: The following FAQ has been updated to reflect current trends and best-practices in August 2025.

How does string interning in FalkorDB improve both memory usage and query performance?

String interning deduplicates repeated string values so each unique value is stored only once in memory. This can cut memory usage by 20%–60% on graphs with high attribute repetition. Query performance improves because equality checks on interned strings compare memory references instead of character sequences, reducing the cost of filters, aggregations, and joins on repeated attributes.

First, run GRAPH.MEMORY USAGE to identify properties with high duplication (e.g., city, status). Then, apply intern() in batches to avoid transaction spikes:

				
					MATCH (n)
WHERE n.city IS NOT NULL
WITH n LIMIT 10000
SET n.city = intern(n.city);

				
			

Use intern() at insert-time for bulk loads, schema migrations, or ETL transformations where repeated strings are expected. Normalize values (e.g., toLower(), trim()) before interning to maximize deduplication. Example with CSV import:

				
					LOAD CSV FROM 'file://users.csv' AS row
CREATE (u:User {name: intern(toLower(trim(row[1])))});

				
			

This ensures every identical string maps to the same memory reference from the moment it’s inserted.

Citations
  1. FalkorDB Documentation — String Interning. FalkorDB v4.10 Release Notes. Available at: https://github.com/FalkorDB/FalkorDB/pull/1095

  2. FalkorDB Documentation — GRAPH.MEMORY USAGE Command. Available at: https://falkordb.com/docs/commands/graph.memory-usage/

  3. FalkorDB Documentation — Cypher intern() Function Reference. Available at: https://falkordb.com/docs/cypher/functions/string/#intern

  4. FalkorDB Docker Image. Available at: https://hub.docker.com/r/falkordb/falkordb

  5. FalkorDB Python Client. Available at: https://pypi.org/project/falkordb/


References