Graph Databases for Cloud Security Posture Management (CSPM)
Last Updated on September 29, 2025 by Editorial Team
Author(s): Sai Bhargav Rallapalli
Originally published on Towards AI.
Cloud infrastructure is a dynamic, sprawling landscape. As organizations embrace multi-cloud and hybrid environments, managing security becomes a complex, multi-dimensional challenge. Traditional security tools often struggle to provide the context needed to understand a single point of failure or a hidden attack path. This is where a graph-based approach to Cloud Security Posture Management (CSPM) comes in.

CSPM’s Importance
Cloud Security Posture Management (CSPM) is the continuous process of monitoring and managing an organization’s cloud environment to ensure compliance, detect misconfigurations, and prevent security breaches.
Some common challenges in CSPM include:
- Sprawling Resources: The sheer number of virtual machines, S3 buckets, security groups, and IAM roles, which makes it difficult to maintain a complete overview.
- Misconfigurations: A simple mistake, like an overly permissive security group rule, can expose an entire network.
- Hidden Attack Paths: Attackers exploit chains of relationships. For example, a publicly exposed VM might lead to a database through a series of roles and permissions.
Traditional tools, like dashboards and SIEMs (Security Information and Event Management Systems), often provide a fragmented, tabular view of data. They are great at showing a single event but fail to connect the dots. A graph-based approach, however, treats your cloud infrastructure as an interconnected network, which is exactly what it is.
FalkorDB, an open-source, in-memory graph database, is an excellent fit for this task. It’s built on Redis, making it incredibly fast. It supports the openCypher query language, which is intuitive and powerful, and it’s developer-friendly, allowing you to get up and running quickly.
What is CSPM?
At its core, CSPM is about continuous monitoring and compliance enforcement.
The key objectives of CSPM are to:
- Detect Misconfigurations: Proactively find misconfigured resources, like unencrypted databases or publicly accessible storage buckets.
- Identify Compliance Risks: Map your cloud posture to regulatory frameworks like CIS, NIST, and PCI-DSS.
- Maintain Least Privilege Access: Ensure users and services have only the permissions they need to do their jobs.
- Enforce Policies: Automate policy enforcement across multi-cloud environments.
In modern hybrid and multi-cloud setups, CSPM is critical because the complexity and interdependencies make it impossible to rely on manual checks alone.
Why Graphs for CSPM?
Cloud infrastructure is highly inter-connected. A user is related to a role, which is attached to a virtual machine, which in turn is a member of a security group that allows ingress from an IP range. A graph database is the most natural way to model these connections.
Tabular views simply cannot capture these relationships without complex, resource-intensive joins. Graph databases, however, enable:
- Attack Path Analysis: Perform multi-hop queries to trace a potential path from an external attacker to a sensitive data store.
- Visibility into Lateral Movement: Understand how an attacker could move from one compromised resource to another.
- Faster “What-If” Queries: Quickly analyze the impact of a change, such as adding a new user or opening a firewall port.
- Holistic View: Gain a complete, contextual understanding of your cloud assets.

Modeling Cloud Infrastructure in a Graph
To start, we need to define our graph schema. This involves identifying the entities (nodes) and the relationships (edges) between them.
Entities (Node Labels):
- :Compute (e.g., EC2, Azure VM, GCP Compute Instance)
- :SecurityGroup
- :IPRange
- :IAMRole
- :DataStore (e.g., S3 Bucket, GCP Cloud Storage, Azure Blob Storage)
- :VPC
Relationships (Edge Types):
- :INGRESS_TO
- :ATTACHED_TO
- :ASSUMES
- :ALLOWS
- :IN_VPC
- :PEERED_WITH
The flexible schema of a graph database is a huge benefit for multi-cloud environments. You can easily add properties to nodes and edges to handle the unique nuances of AWS, Azure, or GCP without a rigid, predefined table structure.
Hands-On with FalkorDB
Let’s get hands-on with FalkorDB.
Quickstart with Docker
The easiest way to get started is with Docker.
Open a terminal and run the below command
docker run -p 6379:6379 -p 3000:3000 -it --rm falkordb/falkordb:latest --requirepass your_strong_password
This command pulls the FalkorDB image and runs it, exposing port 6379. FalkorDB also comes with a built-in browser that you can access at http://localhost:3000 to easily browse through the graph.
Create a Directory and Initialize the Project
mkdir falkordb_demo
cd falkordb_demo
uv init
uv venv
source .venv/bin/activate
touch requirements.txt
uv add -r requirements.txt
touch falkordb_test.ipynb
requirements.txt
falkordb
langchain_community
Connect with a Python Client
You can use the falkordb-py library to interact with your database.
falkordb_test.ipynb
import os
from langchain_community.graphs import FalkorDBGraph
import falkordb
# Connect to the database
db = falkordb.FalkorDB(host='localhost', port=6379)
graph = db.select_graph('cspm-graph')
or
FALKORDB_HOST = os.getenv("FALKORDB_HOST", "localhost")
FALKORDB_PORT = int(os.getenv("FALKORDB_PORT", 6379))
print(f"Connecting to FalkorDB at {FALKORDB_HOST}:{FALKORDB_PORT}...")
try:
# The FalkorDBGraph wrapper handles the database connection.
graph = FalkorDBGraph(
database="cspm-graph",
host=FALKORDB_HOST,
port=FALKORDB_PORT
)
print("Connection successful.")
except Exception as e:
print(f"An error occurred during connection: {e}")
graph = None
Seed a Sample Graph
The key to a graph-based approach is to show how complicated permissions can become on-cloud providers. We’ll use a small dataset to demonstrate this. We will insert nodes and edges using Cypher queries.
Use Case 1: Detect Internet-Exposed Compute with Data Access
This is a classic CSPM use case. A misconfiguration allows a server to be publicly exposed, and an overly permissive IAM role gives it access to sensitive data. In a tabular view, this would require joining multiple tables and manually tracing the permissions. With a graph, it’s a single traversal query.
The Attack Path
0.0.0.0/0 → Security Group → VM → IAM Role → S3 Bucket
This is the kind of CSPM problem a graph database is built to solve. We want to find any datastore that is reachable from the public internet.
if graph:
print("Creating sample graph data...")
creation_query = """
CREATE (internet:IPRange {cidr: '0.0.0.0/0', description: 'Public Internet'}),
(sg_web:SecurityGroup {name: 'Web_Server_SG'}),
(vm_web:Compute {id: 'i-exposed-web-01', platform: 'AWS EC2', type: 'Web Server'}),
(role_s3:IAMRole {name: 'S3AccessRole', description: 'Allows read/write to sensitive data'}),
(s3_sensitive:DataStore {name: 'sensitive-data-bucket', type: 'S3'}),
(internet)-[:INGRESS_TO]->(sg_web)-[:ATTACHED_TO]->(vm_web),
(vm_web)-[:ASSUMES]->(role_s3)-[:ALLOWS {actions: ['s3:GetObject', 's3:PutObject'], resource: 'arn:aws:s3:::sensitive-data-bucket/*'} ]->(s3_sensitive),
(sg_ssh:SecurityGroup {name: 'SSH_Access_SG'}),
(vm_ssh:Compute {id: 'i-exposed-ssh-02', platform: 'GCP GCE', type: 'Jump Box'}),
(internet)-[:INGRESS_TO {protocol: 'TCP', port: 22}]->(sg_ssh)-[:ATTACHED_TO]->(vm_ssh)
"""
graph.query(creation_query)
print("Graph data created successfully.")
graph.refresh_schema()
print(f"Graph schema refreshed. New schema includes: {graph.schema}")

Response:
Creating sample graph data...
Graph data created successfully.
Graph schema refreshed. New schema includes: Node properties: [[OrderedDict({'label': 'DataStore', 'keys': ['name', 'type']})], [OrderedDict({'label': 'IAMRole', 'keys': ['name', 'description']})], [OrderedDict({'label': 'IPRange', 'keys': ['cidr', 'description']})], [OrderedDict({'label': 'Compute', 'keys': ['id', 'platform', 'type']})], [OrderedDict({'label': 'SecurityGroup', 'keys': ['name']})]]
Relationships properties: [[OrderedDict({'types': 'INGRESS_TO', 'keys': ['protocol', 'port']})], [OrderedDict({'types': 'ALLOWS', 'keys': ['actions', 'resource']})], [OrderedDict({'types': 'ASSUMES', 'keys': []})], [OrderedDict({'types': 'ATTACHED_TO', 'keys': []})]]
Relationships: [[OrderedDict({'end': 'SecurityGroup', 'start': 'IPRange', 'type': 'INGRESS_TO'})], [OrderedDict({'end': 'Compute', 'start': 'SecurityGroup', 'type': 'ATTACHED_TO'})], [OrderedDict({'end': 'IAMRole', 'start': 'Compute', 'type': 'ASSUMES'})], [OrderedDict({'end': 'DataStore', 'start': 'IAMRole', 'type': 'ALLOWS'})]]
def find_internet_to_sensitive_data_path(graph):
"""
Executes a Cypher query to find any internet-to-sensitive-data attack paths.
This answers the natural language prompt: "Show any internet path to sensitive data".
"""
if not graph:
print("Graph connection is not available. Cannot perform query.")
return
print("\n--- Running internet-to-sensitive-data analysis ---")
query_string = """
MATCH (ip:IPRange {cidr: '0.0.0.0/0'})
-[:INGRESS_TO]->(:SecurityGroup)
-[:ATTACHED_TO]->(c:Compute)
-[:ASSUMES]->(r:IAMRole)
-[:ALLOWS]->(d:DataStore {name: 'sensitive-data-bucket'})
RETURN c.id, c.platform, r.name, d.name, d.type
"""
try:
result = graph.query(query_string)
if result:
print("\n!!! DANGER: The following attack paths were found !!!")
print("-" * 50)
print(f"| {'Compute ID':<20} | {'Platform':<15} | {'IAM Role':<20} | {'DataStore':<20} |")
print("-" * 50)
for row in result:
compute_id, platform, role_name, datastore_name, datastore_type = row
print(f"| {compute_id:<20} | {platform:<15} | {role_name:<20} | {datastore_name:<20} |")
print("-" * 50)
else:
print("No critical attack paths found in the current graph.")
except Exception as e:
print(f"An error occurred during query execution: {e}")
def find_ssh_exposure(graph):
"""
Executes a Cypher query to find any public SSH exposures.
This answers the natural language prompt: "Flag any SSH exposure"."""
if not graph:
print("Graph connection is not available. Cannot perform query.")
return
print("\n--- Running SSH exposure analysis ---")
query_string = """
MATCH (ip:IPRange {cidr: '0.0.0.0/0'})
-[:INGRESS_TO {port: 22}]->(sg:SecurityGroup)
-[:ATTACHED_TO]->(c:Compute)
RETURN c.id, c.platform, sg.name
"""
try:
result = graph.query(query_string)
if result:
print("\n!!! DANGER: The following SSH exposures were found !!!")
print("-" * 50)
print(f"| {'Compute ID':<20} | {'Platform':<15} | {'Security Group':<20} |")
print("-" * 50)
for row in result:
compute_id, platform, sg_name = row
print(f"| {compute_id:<20} | {platform:<15} | {sg_name:<20} |")
print("-" * 50)
else:
print("No public SSH exposures found in the current graph.")
except Exception as e:
print(f"An error occurred during query execution: {e}")
if graph:
find_internet_to_sensitive_data_path(graph)
find_ssh_exposure(graph)
Response:
--- Running internet-to-sensitive-data analysis ---
!!! DANGER: The following attack paths were found !!!
--------------------------------------------------
| Compute ID | Platform | IAM Role | DataStore |
--------------------------------------------------
| i-exposed-web-01 | AWS EC2 | S3AccessRole | sensitive-data-bucket |
--------------------------------------------------
--- Running SSH exposure analysis ---
!!! DANGER: The following SSH exposures were found !!!
--------------------------------------------------
| Compute ID | Platform | Security Group |
--------------------------------------------------
| i-exposed-ssh-02 | GCP GCE | SSH_Access_SG |
Use Case 2: Least Privilege & Toxic Combinations
Security teams constantly struggle with the question: “Who can reach this database — directly or indirectly?” The graph lets you chase edges across assume-role, peering, and transit relationships to find hidden permission chains. This is a crucial part of CSPM.
if graph:
print("Creating sample graph data...")
creation_query = """
CREATE (admin:Identity {name: 'Alice', type: 'Admin'}),
(dev:Identity {name: 'Bob', type: 'Developer'}),
(group_dev:Group {name: 'Developers'}),
(role_vpc_peering:IAMRole {name: 'VpcPeeringRole'}),
(vpc_dev:VPC {name: 'dev_vpc', cidr: '10.0.1.0/24'}),
(vpc_prod:VPC {name: 'prod_vpc', cidr: '10.0.2.0/24'}),
(compute_in_dev:Compute {id: 'i-dev-01', platform: 'AWS EC2'}),
(db_prod:DataStore {name: 'prod-db-main', encrypted: false, type: 'MySQL'}),
(dev)-[:MEMBER_OF]->(group_dev),
(group_dev)-[:GRANTS]->(role_vpc_peering),
(compute_in_dev)-[:IN_VPC]->(vpc_dev),
(role_vpc_peering)-[:ASSUMES]->(compute_in_dev),
(vpc_dev)-[:PEERED_WITH]->(vpc_prod),
(db_prod)-[:IN_VPC]->(vpc_prod)
"""
graph.query(creation_query)
print("Graph data created successfully.")
graph.refresh_schema()
print(f"Graph schema refreshed. New schema includes: {graph.schema}")

Creating sample graph data...
Graph data created successfully.
Graph schema refreshed. New schema includes: Node properties: [[OrderedDict({'label': 'DataStore', 'keys': ['name', 'encrypted', 'type']})], [OrderedDict({'label': 'Group', 'keys': ['name']})], [OrderedDict({'label': 'IAMRole', 'keys': ['name']})], [OrderedDict({'label': 'Identity', 'keys': ['name', 'type']})], [OrderedDict({'label': 'Compute', 'keys': ['id', 'platform']})], [OrderedDict({'label': 'VPC', 'keys': ['name', 'cidr']})]]
Relationships properties: [[OrderedDict({'types': 'MEMBER_OF', 'keys': []})], [OrderedDict({'types': 'IN_VPC', 'keys': []})], [OrderedDict({'types': 'ASSUMES', 'keys': []})], [OrderedDict({'types': 'PEERED_WITH', 'keys': []})], [OrderedDict({'types': 'GRANTS', 'keys': []})]]
Relationships: [[OrderedDict({'end': 'Group', 'start': 'Identity', 'type': 'MEMBER_OF'})], [OrderedDict({'end': 'IAMRole', 'start': 'Group', 'type': 'GRANTS'})], [OrderedDict({'end': 'Compute', 'start': 'IAMRole', 'type': 'ASSUMES'})], [OrderedDict({'end': 'VPC', 'start': 'VPC', 'type': 'PEERED_WITH'})], [OrderedDict({'end': 'VPC', 'start': 'Compute', 'type': 'IN_VPC'})], [OrderedDict({'end': 'VPC', 'start': 'DataStore', 'type': 'IN_VPC'})]]
def find_indirect_db_access(graph):
"""
Executes a Cypher query to find any indirect access paths to a production database.
This answers the natural language prompt: "Which identities can indirectly reach a production database?"
"""
if not graph:
print("Graph connection is not available. Cannot perform query.")
return
print("\n--- Running Indirect DB Access analysis ---")
print("Searching for identities that can indirectly reach an unencrypted production database...")
query_string = """
MATCH (i:Identity)
-[:MEMBER_OF]->(:Group)
-[:GRANTS]->(r:IAMRole)
-[:ASSUMES]->(c:Compute)
-[:IN_VPC]->(vpc_a:VPC)
-[:PEERED_WITH]->(vpc_b:VPC)
-[:IN_VPC]-(db:DataStore {encrypted: false, type: 'MySQL'})
RETURN i.name, c.id, vpc_a.name, vpc_b.name, db.name
"""
try:
result = graph.query(query_string)
if result:
print("\n!!! DANGER: The following indirect access paths were found !!!")
print("-" * 80)
print(f"| {'Identity':<15} | {'Compute ID':<15} | {'VPC-A':<15} | {'VPC-B (Peered)':<15} | {'DB Name':<15} |")
print("-" * 80)
for row in result:
identity, compute, vpc_a, vpc_b, db_name = row
print(f"| {identity:<15} | {compute:<15} | {vpc_a:<15} | {vpc_b:<15} | {db_name:<15} |")
print("-" * 80)
else:
print("No indirect access paths found to unencrypted production databases.")
except Exception as e:
print(f"An error occurred during query execution: {e}")
if graph:
find_indirect_db_access(graph)
--- Running Indirect DB Access analysis ---
Searching for identities that can indirectly reach an unencrypted production database...
!!! DANGER: The following indirect access paths were found !!!
--------------------------------------------------------------------------------
| Identity | Compute ID | VPC-A | VPC-B (Peered) | DB Name |
--------------------------------------------------------------------------------
| Bob | i-dev-01 | dev_vpc | prod_vpc | prod-db-main |
--------------------------------------------------------------------------------
This query finds all non-encrypted databases that dev-user-1 can access through any number of group memberships and role assumptions. It’s a powerful example of how a single query can reveal complex permission chains, a key function of CSPM.
Datasets Developers Can Try
To build your own CSPM graph, you’ll need to export your cloud inventory. Here are some great tools and data sources:
- CloudQuery: An open-source tool that extracts configuration data from your cloud environment and exports it to various formats, including JSON and CSV, which can be loaded into FalkorDB.
- Cartography: Developed by Lyft, this tool maps your cloud assets in a graph. You can use its schema as inspiration.
- Native Cloud Exports: You can also use services like AWS Security Hub, Azure Resource Graph, or GCP Cloud Asset Inventory to get raw data for ingestion.
Beyond Queries: Building CSPM on Graphs
The power of a graph-based approach to CSPM goes beyond ad-hoc queries:
- Save Queries As Alerts: Turn your critical queries into recurring jobs that send alerts when a new security vulnerability is found.
- Integrate with CI/CD: Use FalkorDB results to enforce security guardrails, preventing new resources from being deployed with critical misconfigurations.
- Plug into SIEM/SOAR: Use FalkorDB as a context-enrichment layer for your Security Information and Event Management (SIEM) or Security Orchestration, Automation, and Response (SOAR) pipelines.
- Visualization: Use the FalkorDB browser or other graph visualization tools to show attack paths to stakeholders.
Conclusion
Graphs unlock contextual visibility for CSPM, allowing security teams to move beyond fragmented, tabular data and see the complete picture of their cloud infrastructure. FalkorDB’s openCypher support and high performance make it an ideal engine for this type of cloud-scale posture analysis.
Your next steps are simple:
- Try the FalkorDB Docker instance, or Sign Up to the cloud.
- Import a small sample of your own cloud inventory data.
- Extend the queries presented here to fit your specific compliance and security requirements.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.