Beyond Rows and Columns: Exploring the Missing Third Dimension

img-2

If you are working with data, you might be familiar with the concepts of rows and columns, which are the basic building blocks of most database models. However, there is another dimension that is often overlooked or ignored, which can offer new possibilities and insights for your data analysis. 

In this blog post, I will compare three common database storage models: row-based, column-based and network-based. These models affect how data is stored, accessed and manipulated in a database system along three dimensions. 

The first dimension

The first dimension, and the most common one, is the row-based (aka Relational) model, where data is stored in rows, where each row represents a record or an entity. For example, a row in a table of customers might contain the name, address, phone number and email of a single customer.

Some examples of row-based databases are MySQL, PostgreSQL and Oracle. Row-based databases are good for transactional processing, where you need to insert, update or delete individual records quickly and frequently. They are also good for queries that involve many columns or attributes of a record, such as finding all the customers who live in a certain city and have a certain age range. In a broader view Document databases and Key-Value databases like MongoDB and Redis are also Row-based databases, and are optimized for full entity retrieval. 

The second dimension

The second dimension is the column-based (aka Columnar) model, where data is stored in columns, where each column represents an attribute or a feature of a record. For example, a column in a table of customers might contain the names of all the customers, another column might contain their addresses, and so on. 

Some examples of column-based databases are Cassandra, ScyllaDB and Amazon Redshift. Column-based databases are good for analytical processing, where you need to perform calculations or aggregations on large amounts of data. They are also good for queries that involve few columns or attributes of a record, such as finding the average age of all the customers. In a broader view Time Series databases like InfluxData and Timescale are also column-based databases, and are optimized for multi-events aggregations. 

The third model

The third model is the network-based model (aka Graph), where data is stored in records that have references to other records. For example, a record in a set of customers might contain the name and address of a single customer, as well as a pointer to another record that contains the phone number and email of the same customer. Some examples of Graph databases are Neo4j, Amazon Neptune and FalkorDB

Found the third dimension

Graph databases are not just another type of database, but rather the missing third dimension in the database world. Unlike the first two dimensions, which optimized the way the data is stored for their use cases, most graph databases use Adjacency lists, which are not efficient for retrieving and traversing cross-entity references (edges). FalkorDB is the first and only database that is optimized for the third dimension, putting the edge storage at the center by using GraphBLAS to represent the relations topology. By replacing adjacency lists with adjacency matrices, FalkorDB ensures that the edge storage is the most suitable for use cases that need to access edges and traverse the graph.

As you can see, each database storage model has its own advantages and disadvantages depending on the type and purpose of your data. You should choose the model that best suits your needs and requirements. In some cases, you might even use a hybrid approach that combines different models to optimize your performance and functionality.

GraphRAG, CodeGraph and Graph DBMS news, guides and opinions delivered weekly. No spam, cancel anytime.