Node Classification in Dynamic Graphs

Last Updated on October 9, 2025 by Editorial Team

Author(s): Kalpan Dharamshi

Originally published on Towards AI.

Machine Learning on Graphs

Machine learning on graphs, often referred to as Graph Machine Learning (GML), is a rapidly growing field that applies machine learning techniques to data structured as graphs, where entities (nodes or vertices) and their relationships (edges) are explicitly modeled. Traditional machine learning assumes data points are independent, but GML leverages the relational information inherent in graph structures to make more accurate predictions. Key areas include node classification (predicting the type or category of a node, e.g., identifying bot accounts on social media), link prediction (predicting missing or future connections, e.g., friend recommendations), and graph classification (classifying entire graphs, e.g., categorizing molecules by their properties). Methods like Graph Neural Networks (GNNs), which learn representations by aggregating information from a node’s neighbors, are central to this field, enabling powerful applications in drug discovery, social network analysis, recommender systems, and fraud detection.

Node Classification in Dynamic Graphs — A simple graph neural network with multiple node classes

What is unique in machine learning on graphs?

The central unique challenge in Machine Learning on Graphs (GML) is the dynamic and constantly evolving nature of real-world graph data. Unlike static datasets, the entities (nodes) and their relationships (edges) frequently change over time. For instance, in a social network, new connections are formed while older ones become inactive. This evolution means that a node’s properties, and consequently its true classification or label (e.g., whether a user is an influencer, a customer, or an anomaly), may not be permanent. Therefore, GML models must be robust enough to handle these temporal changes and require continuous retraining or a dynamic approach (like using Dynamic Graph Neural Networks) to ensure that the node classifications remain relevant and accurate to the current state of the network.

How can we perform node classification for Anomaly Detection?

A common and critical application of Machine Learning on Graphs is Anomaly Detection, which involves identifying rare or fraudulent entities. We specifically perform node classification to determine the anomalous nodes — for example, marking a user account as a bot or a transaction as fraudulent. In the financial world, this technique is leveraged to detect suspicious transaction outliers within a network of accounts and take appropriate actions on those fraudulent nodes.

Our Dataset

It was challenging to find a dataset for anomaly detection, so we decided to generate synthetic data for our use case of fraud detection. We have leveraged pytorch-geometric library as the base for our experiment.

Datasets used for fraud detection are inherently and severely imbalanced, meaning the distribution of classes is highly skewed. Specifically, the amount of data belonging to the non-fraudulent class is more prevalent compared to the fraudulent class. In the context of graph-based anomaly detection, the number of fraudulent nodes is often far fewer when compared to the legitimate, non-fraudulent nodes. For instance, in our graph, we considered only 0.5% of the nodes to be fraudulent.

# NOTE: This script requires the 'torch_geometric' library to run successfully.
# If you don't have it, you can install it using:
pip install torch_geometric

import numpy as np
from torch_geometric.data import Data
from torch_geometric.nn import GATConv
from torch_geometric.nn import GAT, GIN, GraphSAGE, GCN

def generate_fraud_graph(
 num_normal_nodes: int = 1000,
 num_fraud_nodes: int = 20,
 num_features: int = 5,
 normal_degree: int = 4,
 fraud_density_factor: int = 15
) -> Data:
 """
 Generates a synthetic graph dataset for fraud detection using PyTorch Geometric's Data object.

 The dataset simulates a financial network where a small, dense cluster of
 fraudulent nodes is embedded within a larger, sparser network of normal nodes.

 Args:
 num_normal_nodes: Number of non-fraudulent nodes.
 num_fraud_nodes: Number of fraudulent nodes (the anomaly cluster).
 num_features: Dimension of node feature vectors.
 normal_degree: Average number of connections for normal nodes.
 fraud_density_factor: Multiplier for connectivity within the fraud cluster.

 Returns:
 A torch_geometric.data.Data object containing the synthetic graph.
 """
 total_nodes = num_normal_nodes + num_fraud_nodes
 print(f"Generating graph with {total_nodes} nodes ({num_fraud_nodes} fraudulent).")

 # 1. Generate Node Features (x)

 # Normal features (e.g., account age, daily transactions) - centered around 1.0
 # Use a low variance to signify 'normal' stable behavior.
 x_normal = np.random.normal(loc=1.0, scale=0.1, size=(num_normal_nodes, num_features)).astype(np.float32)

 # Fraudulent features - shifted significantly (e.g., higher average transaction amount)
 # Use a higher variance to signify 'anomalous' unstable behavior.
 x_fraud = np.random.normal(loc=5.0, scale=0.8, size=(num_fraud_nodes, num_features)).astype(np.float32)

 # Combine features
 x_np = np.concatenate([x_normal, x_fraud], axis=0)
 x = torch.tensor(x_np, dtype=torch.float)

 # 2. Generate Node Labels (y)

 # 0 for normal, 1 for fraud
 y_normal = np.zeros(num_normal_nodes, dtype=np.longlong)
 y_fraud = np.ones(num_fraud_nodes, dtype=np.longlong)
 y_np = np.concatenate([y_normal, y_fraud], axis=0)
 y = torch.tensor(y_np, dtype=torch.long)

 # 3. Generate Edges (edge_index)

 # --- A. Normal Transactions (Sparse Random Connections) ---
 normal_edges = []

 # Create random edges for the normal population
 for i in range(num_normal_nodes):
 # Sample 'normal_degree' random neighbors from the normal population
 neighbors = np.random.choice(num_normal_nodes, normal_degree, replace=False)
 for j in neighbors:
 if i != j:
 normal_edges.append((i, j))

 # --- B. Fraudulent Subgraph (Dense Cluster) ---
 fraud_start_index = num_normal_nodes
 fraud_end_index = total_nodes
 fraud_edges = []

 # Connect fraudulent nodes heavily among themselves and occasionally to normal nodes
 for i in range(fraud_start_index, fraud_end_index):
 # High internal connectivity within the fraud group
 internal_neighbors = np.random.choice(
 np.arange(fraud_start_index, fraud_end_index),
 fraud_density_factor, # Dense connections
 replace=True
 )
 for j in internal_neighbors:
 if i != j:
 fraud_edges.append((i, j))

 # Sparse connection to the normal population (simulating initial compromise)
 if np.random.rand() < 0.2: # 20% chance of connecting to a normal node
 random_normal_node = np.random.randint(0, num_normal_nodes)
 fraud_edges.append((i, random_normal_node))
 fraud_edges.append((random_normal_node, i)) # Bi-directional link

 # --- C. Combine and Format Edges ---

 all_edges_np = np.array(normal_edges + fraud_edges).T

 # Remove duplicate edges and ensure graph is undirected (for simplicity in GNN setup)
 # In a real scenario, fraud is directed, but PyG often prefers undirected for simplicity.
 # We will just convert the list of (source, target) pairs to the required tensor format.

 # Note: PyG's Data object expects the edge index in (2, num_edges) format
 edge_index = torch.tensor(all_edges_np, dtype=torch.long)

 # 4. Create the PyTorch Geometric Data Object

 data = Data(x=x, edge_index=edge_index, y=y)

 # Ensure edges are undirected (if desired) and remove self-loops
 # You would typically do this with transforms, but we'll do it manually for a clean script
 # For simplicity, we skip converting to undirected here, assuming the GNN layer handles directionality or we only care about adjacency.

 print("\n--- Generated Data Object Summary ---")
 print(data)
 print(f"Node features (x) shape: {data.x.shape}")
 print(f"Edge index shape: {data.edge_index.shape}")
 print(f"Number of fraud nodes (Label 1): {data.y.sum().item()}")

 return data

torch.manual_seed(42)
np.random.seed(42)

max_normal_nodes = 4000
max_fraud_nodes = 20

synthetic_data = generate_fraud_graph(
 num_normal_nodes=max_normal_nodes,
 num_fraud_nodes=max_fraud_nodes,
 num_features=5
)

Train a GAT model

We train an attention graph neural network from the pytorch library. The multihead attention neural network assigns higher weights to nodes or features that play a crucial role in node classification. It provides better results than a graph convolutional neural network.

We use percentile logic to determine outliers in our network.

 
 percentile = 0.995

 # 3. Model setup
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 model = GAT(in_channels=5, hidden_channels=2, num_layers=2, out_channels=1, act_first=True).to(device)
 data = synthetic_data.to(device)
 optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
 loss_fn = torch.nn.CrossEntropyLoss()


 # 4. Training loop
 def accuracy(pred_y, y):
 return (pred_y == y).sum() / len(y)


 # 4. Training loop
 def train(grad_flag):
 model.train()
 optimizer.zero_grad()
 z = model(data.x, data.edge_index)
 z = z[:,0]
 y = torch.tensor(data.y, dtype=float)
 loss = loss_fn(z, y)
 p90 = torch.quantile(z,percentile)
 i1 = torch.argwhere(z > p90)
 i0 = torch.argwhere(z < p90)
 z[i1] = 1
 z[i0] = 0
 acc = accuracy(z, y)
 if (grad_flag):
 loss.backward()
 optimizer.step()
 return loss, acc


 # Train the model
 loss_list = []
 acc_list = []
 for epoch in range(1, 201):
 loss, acc = train(epoch%10)
 loss_list.append(loss.item())
 acc_list.append(acc.item())
 print(f'Epoch : {epoch} Loss : {loss}, Accuracy: {acc}')

The model achieves an accuracy of more than 95% which is sufficient for our experimentation purposes.

Epoch : 1 Loss : 265.2370500564575, Accuracy: 0.945024847984314
Epoch : 2 Loss : 258.57829761505127, Accuracy: 0.945024847984314
Epoch : 3 Loss : 252.07925510406494, Accuracy: 0.945024847984314
Epoch : 4 Loss : 245.739972114563, Accuracy: 0.945024847984314
Epoch : 5 Loss : 239.56012439727783, Accuracy: 0.945024847984314
Epoch : 6 Loss : 233.5388650894165, Accuracy: 0.945024847984314
...
Epoch : 170 Loss : 60.2620906829834, Accuracy: 0.9549751281738281
Epoch : 171 Loss : 60.2620906829834, Accuracy: 0.9549751281738281
Epoch : 172 Loss : 60.25451850891113, Accuracy: 0.9549751281738281
Epoch : 173 Loss : 60.24767017364502, Accuracy: 0.9549751281738281
Epoch : 174 Loss : 60.241437911987305, Accuracy: 0.9549751281738281
Epoch : 175 Loss : 60.23574447631836, Accuracy: 0.9549751281738281

Add a new node to the graph

We add a new node to the existing graph and establish connections to existing nodes.

def add_node_to_graph(data: Data, new_x: torch.Tensor, new_y: torch.Tensor, connections: list) -> Data:
 """
 Adds a single node and its connections to an existing PyTorch Geometric Data object.

 Args:
 data: The existing torch_geometric.data.Data object.
 new_x: A 1D tensor of features for the new node (size num_features).
 new_y: A 1D tensor containing the label (0 or 1) for the new node.
 connections: A list of indices of existing nodes to connect the new node to.

 Returns:
 The updated torch_geometric.data.Data object.
 """
 num_existing_nodes = data.num_nodes
 new_node_index = num_existing_nodes

 # 1. Update features (x)
 # The new_x must be unsqueezed to maintain the (N, F) shape after concatenation
 data.x = torch.cat([data.x, new_x.unsqueeze(0)], dim=0)

 # 2. Update labels (y)
 data.y = torch.cat([data.y, new_y], dim=0)

 # 3. Update edges (edge_index)
 new_edges = []
 for target_node_index in connections:
 # Add (new_node -> target) and (target -> new_node) for undirected graph update
 new_edges.append([new_node_index, target_node_index])
 new_edges.append([target_node_index, new_node_index])

 if new_edges:
 # Convert list of edges to the required (2, num_edges) format
 new_edges_tensor = torch.tensor(new_edges, dtype=torch.long).T
 data.edge_index = torch.cat([data.edge_index, new_edges_tensor], dim=1)

 print(f"\n--- Node {new_node_index} Added Successfully ---")
 print(f"New total nodes: {data.num_nodes}, New total edges: {data.edge_index.size(1)}")
 return data

We will assign a non-fraud label (0) to our new node and add it to our graph. The connections and features established for the node are similar to fraud nodes.

 # 2. Define a new node to add (e.g., a new fraudulent account)
 num_features = synthetic_data.x.size(1)

 # New node features: high average features (like fraud features)
 new_fraud_features = torch.tensor(
 np.random.normal(loc=6.0, scale=0.5, size=(num_features,)),
 dtype=torch.float
 )

 # New node label: 0 (Non-Fraudulent) assigned to Fraud Node
 new_fraud_label = torch.tensor([0], dtype=torch.long)

 # New node connections: Connect to 3 existing fraud nodes (indices 1500 to 1549)
 # and 1 random normal node (index 0 to 1499) to simulate a real-world pattern.
 fraud_indices = np.arange(max_normal_nodes, max_normal_nodes+max_fraud_nodes)

 # Select 3 random fraud connections and 1 random normal connection
 connections_to_add = list(np.random.choice(fraud_indices, 3, replace=False))
 connections_to_add.append(np.random.randint(0, max_normal_nodes))

 print(f"\nNew node features: {new_fraud_features.tolist()[:3]}")
 print(f"Connecting new node to existing nodes: {connections_to_add}")

 # 3. Add the new node
 synthetic_data = add_node_to_graph(
 data=synthetic_data,
 new_x=new_fraud_features,
 new_y=new_fraud_label,
 connections=connections_to_add
 )

Experiment Results

We re-run the graph through the trained model and check whether a correct label has been assigned to the node.

 new_node_index = max_fraud_nodes + max_normal_nodes

 y = model(synthetic_data.x,synthetic_data.edge_index)
 y = y[:, 0]
 p90 = torch.quantile(y,percentile)
 i1 = torch.argwhere(y > p90)
 i0 = torch.argwhere(y < p90)
 y[i1] = 1
 y[i0] = 0
 synthetic_data.y = y

 # 4. Final summary
 print("\n--- Final Data Object Summary ---")
 print(synthetic_data)
 print(f"Final Node features (x) shape: {synthetic_data.x.shape}")
 print(f"Final Edge index shape: {synthetic_data.edge_index.shape}")
 print(f"Final Number of fraud nodes (Label 1): {synthetic_data.y.sum().item()}")
 print('Node classification : ',y[new_node_index])

--- Final Data Object Summary ---
Data(x=[4021, 5], edge_index=[2, 16293], y=[4021])
Final Node features (x) shape: torch.Size([4021, 5])
Final Edge index shape: torch.Size([2, 16293])
Final Number of fraud nodes (Label 1): 21.0
Node classification : tensor(1., grad_fn=<SelectBackward0>)

The node classification has been updated to Fraud (1) from Non-Fraud (0). It indicates that our experiment has successfully modified the class label to the correct value. The final number of fraud nodes in the network has increased from 20 to 21.

Conclusion

The future of graph-based machine learning lies in handling dynamic graphs, where the continuous evolution of the network structure complicates traditional modeling. Graph Neural Networks (GNNs) offer an effective and scalable solution. Their inherent inductive nature allows them to easily incorporate new nodes and relationships. This capability is crucial for continuously monitoring and updating the classification of existing entities, ensuring the model’s predictions remain relevant and accurate across the entire temporal lifecycle of the graph.

Hope you liked the article and learned something new today !!!

The entire code for the experiment can be found on GitHub.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Node Classification in Dynamic Graphs

Author(s): Kalpan Dharamshi

Machine Learning on Graphs

What is unique in machine learning on graphs?

How can we perform node classification for Anomaly Detection?

Our Dataset

Train a GAT model

Add a new node to the graph

Experiment Results

Conclusion

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Node Classification in Dynamic Graphs

Author(s): Kalpan Dharamshi

Machine Learning on Graphs

What is unique in machine learning on graphs?

How can we perform node classification for Anomaly Detection?

Our Dataset

Train a GAT model

Add a new node to the graph

Experiment Results

Conclusion

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement