Highlights
- Code graph analysis visualizes function calls, class inheritance, and module imports that compilers use at runtime, replacing outdated architectural diagrams.
- Graph representations expose dead code and orphaned functions by showing zero connections, enabling teams to refactor legacy code without risk.
- Visual code graphs accelerate developer onboarding by mapping file relationships that take hours to trace manually through static text.
When developers begin working on an existing codebase, they are often provided with a collection of directories, hundreds of files, and (if they are lucky) an outdated architectural diagram. Each file may define one or more services, tasks or features, and while most applications follow similar directory hierarchies, these structures exist to help humans navigate the project..
Computers do not execute software based on file organization. They rely on the relationships across the files – functions calling functions, classes extending classes, and modules importing modules that determine the outcome of the software.
As codebases grow and evolve, these interconnected relationships multiple, shift and sometimes disappear completely. As developers come and go, the tribal knowledge of the relationships in the files are lost to human knowledge, and the architecture begins to drift from its original underpinnings.
As developers, how can we address this architectural invisibility, and become more aware of these relationships? One option is to use a graph database to visualize the relationships used when actually running the code.
What Does It Mean to “Graph” Source Code?
For those that are not aware, a Graph Database can be used to represent objects -including the the relationships each of the objects share with one another.
- In a graph mode, nodes represent entities in the database (for example, in an organizational database, a person would be a node).
- Edges represent the relationships between the entities. (in the org chart example ,edges would capture reporting lines or managerial relationships).
A graph can be easily visualized, resembling the ball and stick models used in chemistry class. We don’t have a list of atoms, we have a visual map of how they connect into a molecule.
We often find ourselves jumping through source code to find functions, variables and classes to “connect the dots” and understand how the code works. By placing our code into a graph, all of these connections are automatically visualized. Developers can quickly ascertain the same connections that the compiler sees when executing and building the code to be run.
We are no longer dependent on static text, or out of date architectural diagrams, but a living graph of our code.
Why Turn Code into a Graph?
As code evolves and developers churn off of a project, some of the knowledge about what functions and classes do is lost. Some classes are rewritten, but the old code is not removed because the team doesn’t know if it is safe to delete it. So dead/obsolete/legacy code remains for the just in case scenarios.
As new developers onboard, they spend hours looking at files, and switching between them to track functions and variables in the way they are processed while the application is running.
By having a graph visualization showing the connections in the code, developers can quickly ascertain how the program works, and more easily follow the code’s progression. New developers are able to onboard faster, because they can “see” the connections between the files.
Is there “dead” code that is no longer in use? Or “almost dead” code that could be refactored out? With a graph, it is easy to see classes or functions that are no longer in use. If there are no connections, it might be that the code is no longer in use, flagging the code for further analysis (and potentially removing if found to be no longer used). If there are one or two connections into old complicated legacy code, a small refactor will modernize and update the system, and remove the scary vestigial code.
As your system grows and evolves, a visualization also provides a natural way to identify code clusters that might be extracted into a microservice.
How to Build a Code Graph
FalkorDB has built a Code Graph analysis tool that examines GitHub repositories and builds a code graph of the Files, Functions, and Classes present in the codebase . It can be run locally on your system, and after analysis, a code graph is created:
This is the code graph for the GraphRAG-SDK from FalkorDB. Using this graph, we can see how the functions and classes in each file interact with one another.
Understanding Connections
For example, we can see that the KnowledgeGraph class defines a ChatSession function:
And the ChatSession Function calls and returns the ChatSession class defined in the ChatSession.py file.
While this is a simple example, it is powerful to see the connections visually.
The graph allows developers to easily trace connections, and then examine the code in the files once the connections have been made.
Looking at the graph, the blue File and the Pink Functions seem relatively isolated from the rest of the code. Being curious, we investigate this section of the code graph.
The File is named `graph_metrics.py` which sounds interesting. I wonder what this code counts? Clicking on two of the classes, I learn that their names are GraphContextualRelevancy and GraphContextualRecall. The tool reads the code, and on clicking provides the comment from the code describing these two functions:
These two functions seem very important – checking the accuracy and relevance of the output response from an LLM.
But why are they so isolated from the rest of the code?
The only two entry points are on tests:
So it appears that the testing of response output quality is only being used in testing. Perhaps the developers, on seeing this visually in the graph, will connect these Classes to non-test outputs, so that the responses in production are also tested.
Conclusion: The Future of Codebase Analysis is Graph-Shaped
As a developer, I have spent hours bopping between files in a codebase trying to understand what the code is doing, and how the variables change as the code progresses. With classes calling classes and functions in different files, the day often ends with a massive headache, and only partial understanding of the connections between files.
The presence of a code graph greatly simplifies the process of understanding and debugging code. I could imagine keeping the graph visualization of my code open on a 2nd monitor as I track code operations.
By giving a visual representation of complex codebases, graphs are becoming an indispensable tool for developers.
FAQ
What is code graph analysis?
Code graph analysis parses repositories to map nodes (files, functions, classes) and edges (calls, imports, inheritance) into visual graphs that mirror runtime execution paths.
How does code graph analysis identify dead code?
Functions or classes with zero incoming edges in the graph indicate unused code. Single connections flag refactor opportunities to eliminate legacy dependencies.
Can code graph analysis run on private repositories?
FalkorDB’s code graph tool runs locally on your system, analyzing GitHub repositories without external data transmission to maintain codebase privacy.
References and citations
- Codegraph repo: https://github.com/FalkorDB/code-graph
- <a title=”Peter Murray-Rust, CC BY-SA 2.5 <https://creativecommons.org/licenses/by-sa/2.5>, via Wikimedia Commons” href=”https://commons.wikimedia.org/wiki/File:Proline_model.jpg”><img width=”256″ alt=”Proline model” src=”https://upload.wikimedia.org/wikipedia/commons/c/ca/Proline_model.jpg?20201102122801″></a>