Highlights
- Array indexing eliminates full graph scans by treating each array element as an indexed key, reducing query complexity from O(N×M) to O(log N) for membership checks
- Range indexes enable binary search on sorted data for numeric, string, and geospatial queries, accelerating time-series and analytical workloads by 10-100x without property scans
- Native multivalued indexing removes the need for auxiliary nodes or many-to-many mappings, keeping schemas compact while maintaining sub-millisecond query latency at scale
Modern data is rarely simple or one-dimensional. Whether you’re building recommendation engines, modeling user interests, tagging content, or analyzing threat intelligence feeds, your entities often carry multiple values for a single attribute: think lists of skills, genres, tags, behaviors, or signals. These array workloads are everywhere, powering much of today’s most impactful applications.
Conventionally, graph databases have excelled at modeling relationships, but have struggled with efficiently searching within multivalued properties like arrays. Most graph engines are built for traversing edges, not for rapid lookups inside complex, nested attributes. Developers have had to choose between performance and clarity: either reshaping their data into cumbersome relationship-based schemas, or settling for the slowdowns of costly full graph scans.
In this article, we’ll cover the latest array and range indexing enhancements that unlock new levels of flexibility and speed for graph data modeling. We’ll walk through how these features work under the hood, explore real-world use cases, and see exactly how they can turbocharge your queries, with clear examples and performance insights.
Powering Real-World Queries with Array and Range Indexing
Array indexing has long posed a challenge for graph databases, impacting both performance and the flexibility of data modeling. Let’s see why this matters, using a simple example.
First, let’s start the FalkorDB graph database using Docker:
docker run -d \
--name falkordb-instance \
-p 6379:6379 \
-p 3000:3000 \
falkordb/falkordb:latest
Suppose you have the following set of data:
Name | Hobbies |
Alice | [“reading”, “tennis”] |
Bob | [“football”, “tennis”] |
Charlie | [“gaming”, “reading”] |
You can create this graph with:
CREATE (:Person {name: 'Alice', hobbies: ['reading', 'tennis']}),
(:Person {name: 'Bob', hobbies: ['football', 'tennis']}),
(:Person {name: 'Charlie', hobbies: ['gaming', 'reading']})
"
Now, imagine you want to find everyone who enjoys “tennis”:
MATCH (p:Person) WHERE "tennis" IN p.hobbies RETURN p
Without indexing, the database must examine every person’s list to see if “tennis” is present. In a graph with a million people, it scans a million lists. This is called a full scan, resulting in O(n) complexity and poor scalability.
Why is this a problem?
For large graphs, query performance slows dramatically as data grows. Query time increases linearly with the number of nodes, making searches expensive in terms of compute and memory.
The Conventional Approach
One way to address this is to further normalize the data model. Instead of using arrays, you create individual nodes for each hobby and link people to their hobbies through relationships:
CREATE
(alice:Person {name: 'Alice'}),
(bob:Person {name: 'Bob'}),
(charlie:Person {name: 'Charlie'}),
(reading:Hobby {name: 'reading'}),
(tennis:Hobby {name: 'tennis'}),
(football:Hobby {name: 'football'}),
(gaming:Hobby {name: 'gaming'}),
(alice)-[:HAS_HOBBY]->(reading),
(alice)-[:HAS_HOBBY]->(tennis),
(bob)-[:HAS_HOBBY]->(football),
(bob)-[:HAS_HOBBY]->(tennis),
(charlie)-[:HAS_HOBBY]->(gaming),
(charlie)-[:HAS_HOBBY]->(reading)
For example, Alice would be modeled as:
Nodes:
(:Person {name: 'Alice'})
(:Hobby {name: 'tennis'})
Relationships:
(Alice)-[:HAS_HOBBY]->(tennis)
To find everyone who likes tennis, you would query:
MATCH (p:Person)-[:HAS_HOBBY]->(h:Hobby {name:"tennis"}) RETURN p
While this structure supports efficient lookups, it can introduce significant complexity and overhead in large, highly connected graphs. Adding extra relationships just to handle arrays often bloats the model and can impact performance.
Array Indexing to the Rescue
FalkorDB introduces native indexing support for array fields, directly addressing the inefficiencies of array membership queries. You can now store arrays as properties and create indexes on them, allowing queries like the following:
p.hobbies = ["reading", "tennis"]
CREATE INDEX FOR (p:Person) ON (p.hobbies)
This enables highly efficient queries such as:
MATCH (p:Person)
WHERE "tennis" IN p.hobbies
RETURN p
Instead of scanning every node, FalkorDB performs an Index Scan on the hobbies array. Under the hood, each array element is treated as an individual index key, providing direct access to matching nodes. This translates to O(1) or O(log n) lookup times, a dramatic improvement in performance for large datasets.
Performance Comparison: Normalized vs. Indexed Arrays
You can test the performance difference using a simple Python script:
import time
from falkordb import FalkorDB
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')
class FalkorDBQueryTimer:
def __init__(self, host='localhost', port=6379):
self.db = FalkorDB(host=host, port=port)
self.graph_name = 'query_perf_graph'
def clear_graph(self):
try:
self.db.execute_command(f"GRAPH.DELETE {self.graph_name}")
except Exception:
pass
def setup_normalized_data(self):
query = """
CREATE
(alice:Person {name: 'Alice'}),
(bob:Person {name: 'Bob'}),
(charlie:Person {name: 'Charlie'}),
(reading:Hobby {name: 'reading'}),
(tennis:Hobby {name: 'tennis'}),
(football:Hobby {name: 'football'}),
(gaming:Hobby {name: 'gaming'}),
(alice)-[:HAS_HOBBY]->(reading),
(alice)-[:HAS_HOBBY]->(tennis),
(bob)-[:HAS_HOBBY]->(football),
(bob)-[:HAS_HOBBY]->(tennis),
(charlie)-[:HAS_HOBBY]->(gaming),
(charlie)-[:HAS_HOBBY]->(reading)
"""
self.db.execute_command("GRAPH.QUERY", self.graph_name, query)
def setup_array_indexed_data(self):
query = """
CREATE
(:Person {name: 'Alice', hobbies: ['reading', 'tennis']}),
(:Person {name: 'Bob', hobbies: ['football', 'tennis']}),
(:Person {name: 'Charlie', hobbies: ['gaming', 'reading']})
"""
self.db.execute_command("GRAPH.QUERY", self.graph_name, query)
self.db.execute_command("GRAPH.QUERY", self.graph_name, "CREATE INDEX FOR (p:Person) ON (p.hobbies)")
def measure_query_time(self, query):
start = time.time()
self.db.execute_command("GRAPH.QUERY", self.graph_name, query)
end = time.time()
return end - start
def run_comparison(self):
self.clear_graph()
self.setup_normalized_data()
normalized_time = self.measure_query_time(
"MATCH (p:Person)-[:HAS_HOBBY]->(h:Hobby {name: 'tennis'}) RETURN p.name"
)
self.clear_graph()
self.setup_array_indexed_data()
array_time = self.measure_query_time(
"MATCH (p:Person) WHERE 'tennis' IN p.hobbies RETURN p.name"
)
return normalized_time, array_time
if __name__ == "__main__":
print("Running query time comparison in FalkorDB...")
tester = FalkorDBQueryTimer()
norm_time, arr_time = tester.run_comparison()
print(f"Normalized Query Time: {norm_time:.6f}s")
print(f"Array Indexed Query Time: {arr_time:.6f}s")
Normalized Query Time: 0.081214s
Array Indexed Query Time: 0.001433s
Range Indexing for Continuous Data
While array indexing is perfect for membership queries, range indexing complements it by enabling fast queries on ordered data types: such as numerics, strings, and geospatial fields. Range indexing is essential for use cases involving inequalities, sorting, or range-based filters.
For example, to find people within a specific age range:
MATCH (p:Person)
WHERE p.age >= 25 AND p.age <= 35
RETURN p
Without a range index, this query scans every node and checks the age property: an O(n) operation. But by creating a range index:
CREATE INDEX FOR (p:Person) ON (p.age)
… the database can jump directly to nodes within the relevant age range, reducing query time to O(log n).
Range indexing supports a variety of property types and queries, including:
- Numeric ranges:
WHERE p.salary BETWEEN 50000 AND 75000 - String ranges:
WHERE p.lastName >= ‘M’ AND p.lastName < ‘S’ - Date/time ranges:
WHERE e.created_at > timestamp() – 86400 - Geospatial ranges:
WHERE distance(p.location, point) < 5000
These capabilities make it easy to build efficient, real-time search and analytics features on top of your graph data.
Indexing Multivalued Attributes in FalkorDB
Let’s look at how FalkorDB supports indexing on multivalued attributes, enabling efficient execution of queries involving array membership conditions. This capability is powered by enhancements in the indexing pipeline and the query execution engine. The interaction between these components is illustrated in the diagram below. Consider the following query:
MATCH (n:N) WHERE 2 IN n.v RETURN n
where n.v is an array (a multivalued attribute).
When this query is parsed, the engine detects that n.v is multivalued. It prepares the filter expression and identifies the relevant index path. Each element within the array is treated as an individual indexable value, and index entries are created for each value present in the array field. The index can then quickly return node IDs where the array contains the queried value, avoiding full scans and ensuring fast retrieval based on indexed entries. The matching node IDs are passed back to the query engine, which then returns the final results.
The multi_val_type parameter acts as a key differentiator during index registration and lookup. An important feature here is type-aware field naming for arrays: FalkorDB distinguishes between different multivalued types by appending type-specific suffixes, such as :string:arr or :numeric:arr, to the field name. This ensures accurate indexing and retrieval based on the array element’s data type.
Performance Benefits
Array indexing is a powerful tool for optimizing query execution, especially in databases with flexible data models, such as graph or document stores. By indexing individual elements within array fields, expensive filter operations are converted into fast, index-backed lookups, delivering both speed and scalability. Here are the key performance benefits:
Reduced Scan Overhead
Without array indexing, queries that filter on array membership must scan all records and iterate through each array to find matches. This results in O(N) or O(N × M) complexity, where N is the number of records and M is the average array length.
With array indexing, each array element is indexed independently, reducing query complexity to O(log N) or even constant time with hash-based indexes. Only relevant records are touched, avoiding unnecessary data access.
Faster Query Execution
Element-wise indexing dramatically reduces execution time, especially for large datasets and selective filters. For example, a query like:
MATCH (n) WHERE "security" IN n.topics RETURN n
no longer scans every node’s topics array; instead, the engine retrieves only nodes known to contain “security” via the index. Benchmarks and studies routinely show 10× to 100× speedups when replacing full scans with indexed lookups on multivalued fields.
Improved Index Selectivity and Filter Pushdown
Indexing array elements increases filter selectivity, allowing the query planner to make smarter execution choices. When array filters are pushed down to the index level, the engine doesn’t need to evaluate each record manually, reducing the search space and increasing efficiency. This enables several key optimizations:
- Join order optimization: The planner can choose more efficient join strategies with smaller, filtered input sets.
- Fewer intermediate results: Early filtering prevents large sets of irrelevant data from being generated and discarded later.
- More efficient downstream operations: Projections and sorting are faster since fewer rows proceed after filtering, reducing memory and CPU usage.
This is especially valuable for compound queries, where one clause filters on an array and another on a scalar field.
More Predictable Latency Under Load
Array scans can lead to unpredictable performance, especially as concurrency rises or array lengths vary across records. This variability causes latency spikes and inconsistent response times, problems that grow in large-scale or real-time environments.
Array indexing enables direct lookups, delivering consistent and bounded execution times regardless of data volume or structure. This deterministic behavior reduces tail latencies (e.g., lower 99th-percentile latency), increases throughput for transactional and analytical workloads, and improves responsiveness for interactive or latency-sensitive apps. It also leads to better resource utilization, predictable scaling, and improved quality of service.
Scalable Data Modeling
With array indexing, developers can model multivalued attributes naturally, without having to break them into normalized tables or separate relationships just to make values queryable. For example, a property like [“AI”, “Graph”, “Security”] can be efficiently queried and stored as a single attribute, avoiding the need for auxiliary nodes or many-to-many mappings. This keeps the data model close to its real-world representation, easier to understand, and more compact.
This approach avoids artificial entity inflation, where simple string values are unnecessarily represented as full nodes for searchability, and eliminates complex joins or traversals to fetch what is logically inline metadata. Besides simplifying queries, array indexing reduces the overhead of managing complex schemas, multiple indexes, and repeated writes common in overly split or highly connected data models.
Optimized Range Queries
Traditional graph databases struggle with range queries, often requiring full property scans to evaluate inequalities. FalkorDB’s range indexes store data in sorted order, enabling the engine to:
- Binary search to the start of the range in O(log n) time
- Sequentially scan only the relevant index entries within the range
- Terminate early when the range boundary is exceeded
For time-series data, geospatial queries, and analytical workloads involving aggregations over numeric ranges, this can accelerate query performance by 10–100× compared to unindexed approaches.
Array Indexing for Queries with Filters
To see the benefits of array indexing at scale, consider a security system that stores large volumes of alerts or events. Each alert is classified in multiple ways: by threat tags, attack techniques, and affected asset types. For example, an alert node may look like:
{
"alert_id": "A-93421",
"flags": ["malware", "ransomware", "lateral_movement"],
"techniques": ["T1059", "T1486"],
"affected_assets": ["endpoint", "server"]
}
Suppose your security operations center tracks thousands of such alerts daily. You may want to filter for all alerts that are tagged as “malware,” involve Remote Code Execution techniques (“T1059”), and target endpoint devices. With FalkorDB’s array indexing, such queries remain fast, even at large scale.
For illustration, let’s ingest a large dataset with these properties:
from falkordb import FalkorDB
import random
db = FalkorDB()
db.execute_command("GRAPH.DELETE cyber_alert_graph") # Clean previous runs
flags_list = ["malware", "phishing", "abnormal_login", "ransomware", "lateral_movement"]
techniques_list = ["T1059", "T1047", "T1566", "T1204", "T1486"]
assets_list = ["endpoint", "cloud", "network", "server", "mobile"]
# Create 15,000 alerts, each with random classifications
for i in range(15000):
flags = random.sample(flags_list, k=random.randint(1, 3))
techniques = random.sample(techniques_list, k=random.randint(1, 2))
assets = random.sample(assets_list, k=random.randint(1, 2))
db.execute_command(
"GRAPH.QUERY", "cyber_alert_graph",
f"CREATE (:Alert {{alert_id: '{i}', flags: {flags}, techniques: {techniques}, affected_assets: {assets}}})"
)
Now, create indexes across each property:
CREATE INDEX FOR (a:Alert) ON (a.flags)
CREATE INDEX FOR (a:Alert) ON (a.techniques)
CREATE INDEX FOR (a:Alert) ON (a.affected_assets)
You can now run precise queries to filter the data based on any combination of properties:
MATCH (a:Alert)
WHERE "malware" IN a.flags
AND "T1059" IN a.techniques
AND "endpoint" IN a.affected_assets
RETURN a.alert_id
Under the hood, each IN clause leverages its native index, allowing the engine to jump directly to matching alerts—no need for full scans across the entire dataset. This ensures consistently fast lookups, even for massive volumes like 15,000 alert nodes.
You can check the query time as follows:
import time
start = time.time()
db.execute_command(
"GRAPH.QUERY", "cyber_alert_graph",
"MATCH (a:Alert) WHERE 'malware' IN a.flags AND 'T1059' IN a.techniques AND 'endpoint' IN a.affected_assets RETURN a.alert_id"
)
print(f"Query time: {time.time() - start:.6f}s")
Query time: 0.004992s
Industry Use Cases
Array indexing shines in real-world scenarios where entities naturally carry multiple values within a single property. Here are some impactful application areas:
Recommendation Engines
In recommendation systems, both items and users often have arrays of features such as genres, tags, preferences, or behaviors. For example, a movie might be described with array indexing using queries like:
genres: ["adventure", "sci-fi", "action"]
MATCH (m:Movie) WHERE "adventure" IN m.genres RETURN m
are resolved via index lookups rather than by scanning every node and checking the genres field. This is crucial for large catalogs that require real-time content matching, such as suggesting similar shows or filtering by user preferences.
Tagging Systems
On content platforms, blogs, e-commerce sites, or content management systems, nodes such as articles or products are frequently tagged with multiple labels:
tags: ["AI", "machine-learning", "tutorial"]
Array indexing allows for quick filtering, for example:
MATCH (a:Article) WHERE "AI" IN a.tags RETURN a
Without indexing, this query would require scanning all Article nodes and comparing their tags. With indexing, the system can instantly jump to tagged articles, resulting in better performance and a more scalable backend for faceted search and tag-based navigation.
Social Graphs
User-driven platforms often use arrays for properties such as:
- Interests: [“cycling”, “photography”]
- Skills: [“Python”, “GraphQL”]
- Group memberships: [“Data Science Club”, “Hackathon 2025”]
For example:
MATCH (u:User) WHERE "GraphQL" IN u.skills RETURN u
Array indexing makes such queries highly efficient at scale, allowing instant identification of users with shared attributes. This enables features like interest-based friend suggestions or filtered user directories in community apps.
Temporal or Sequence Data
In time-series or event-driven data models, array properties might store multiple values per event, such as response codes, behavior flags, or sentiment labels:
scores: [5, 4, 5, 3]
If each relationship or event holds an array like scores, you might run:
MATCH ()-[e:REVIEWED]->() WHERE 5 IN e.scores RETURN e
Here, array indexing avoids costly relationship scans and enables efficient filtering based on time-based attributes, anomaly detection flags, or user feedback. This is especially useful in analytics pipelines, where each signal may carry a list of numeric or categorical indicators.
Cybersecurity Graphs
In cybersecurity, graph models often represent events, entities, and indicators of compromise (IOCs). A single node, such as a file, IP address, or user session, might have multiple threat labels, detection signatures, or behavior flags. For example:
flags: [“malware”, “suspicious_login”, “abnormal_traffic”]
Using array indexing, queries like:
MATCH (a:Alert) WHERE "malware" IN a.flags RETURN a
can instantly retrieve all alerts tagged with “malware,” without scanning every alert node. This is critical for real-time threat detection, allowing security teams to pivot quickly from one IOC to another.
Array indexing also makes it possible to efficiently correlate diverse data sources, such as logs, threat feeds, and user activity, helping systems surface connected entities or patterns that share common indicators. In large-scale security graphs, this capability drastically reduces investigation and triage response times.
Best Practices
To get the most out of array and range indexing, follow these recommended practices. They will help ensure your indexes remain performant, compact, and reliable over time.
Index Only Scalar Arrays
Focus on arrays of simple, scalar types such as integers, strings, or floats. Avoid indexing complex or nested structures (like arrays of objects or maps), as these can degrade performance and inflate index size. In line with general indexing guidance, prioritize high-value, high-cardinality fields. Over-indexing, or indexing fields with few unique values, often introduces unnecessary overhead and can slow down write performance.
Avoid Null Values in Arrays
Including null values in indexed arrays can cause unpredictable behavior. Null elements may require special handling in the index structure, potentially being ignored or complicating lookups. The best approach is to design your schema and application logic to omit nulls entirely, ensuring only meaningful data is indexed. This aligns with conventional database best practices, which discourage “null pollution” to keep indexes clean and reliable.
Maintain Consistent Element Types
All elements within an array should have the same data type. For example, an integer array should not contain strings or decimals. Mixing data types forces the index to accommodate heterogeneous values, reducing efficiency and increasing maintenance complexity. Uniform element types keep indexes smaller, more selective, and better optimized for scans and lookups.
Optimize Range Index Column Order
When creating composite indexes that combine range and equality conditions, use the “equality first, range second” principle. For example, if your query combines an exact match and a range:
-- Optimal: equality condition first, range condition second
CREATE INDEX FOR (p:Person) ON (p.department, p.salary)
-- Query benefits from ordered index traversal
MATCH (p:Person)
WHERE p.department = 'Engineering' AND p.salary >= 75000
RETURN p
This ordering allows the index scan to start with the specific department and then traverse only the relevant salary range within that department, minimizing the portion of the index that must be scanned.
The array and range indexing capabilities introduced in FalkorDB mark a significant leap forward in graph database technology. By solving long-standing challenges with list-based data structures, these features address a critical market need that has been overlooked for years. For developers and teams working with complex, multivalued data, FalkorDB now offers a uniquely powerful combination of performance, efficiency, and intuitive modeling.
FAQ
How does array indexing differ from traditional graph traversal for filtering?
Array indexing enables direct O(log n) lookups by treating each array element as an individual index key, avoiding O(n) full scans that check every node’s array property.
What query patterns benefit most from range indexing in FalkorDB?
Inequality filters (age >= 25), sorted retrieval, time-series bounds, geospatial distance queries, and aggregations over numeric ranges see 10-100x speedups via binary search.
Can I index nested arrays or complex objects in FalkorDB?
No. Range and array indexes support scalar types only: integers, floats, strings, and geospatial points. Nested structures require flattening before indexing.