FastAPI Observability Lab with Prometheus and Grafana: Complete Guide

Author(s): Faizulkhan

Originally published on Towards AI.

Lab Overview & Concept

What is This Lab?

This is a hands-on observability lab designed to learn how to monitor a FastAPI application using industry-standard tools: Prometheus for metrics collection and Grafana for visualization. The lab demonstrates real-world observability patterns that we can apply to production applications.

Learning Objectives

By completing this lab, we will learn:

How to instrument a FastAPI application with Prometheus metrics.
How Prometheus scrapes and stores metrics from applications.
How to build Grafana dashboards using PromQL queries.
How to monitor key application metrics:

Request rates (traffic)
Latency percentiles (performance)
Error rates (reliability)
Status code breakdowns (health)

Lab Concept: The Three Pillars of Observability

This lab focuses on metrics, one of the three pillars of observability:

Metrics: Quantitative measurements over time (this lab).
Logs: Discrete events with timestamps.
Traces: Request flows through distributed systems.

Why FastAPI + Prometheus + Grafana?

FastAPI: Modern, high-performance Python web framework.
Prometheus: Industry-standard metrics collection and storage system.
Grafana: Powerful visualization and alerting platform.

Together, they form a complete observability stack that’s can be widely used in production environments.

Project Architecture

System Components

┌─────────────────┐
│ FastAPI App │ ← Exposes /metrics endpoint
│ (Port 8000) │
└────────┬────────┘
 │ HTTP GET /metrics
 │ (every 15 seconds)
 ▼
┌─────────────────┐
│ Prometheus │ ← Scrapes and stores metrics
│ (Port 9090) │
└────────┬────────┘
 │ PromQL Queries
 │ (via HTTP API)
 ▼
┌─────────────────┐
│ Grafana │ ← Visualizes metrics in dashboards
│ (Port 3030) │
└─────────────────┘

Project Structure

fastapi_observability_lab/
├── app/
│ └── main.py # FastAPI application with metrics instrumentation
├── docker-compose.yml # Orchestrates all three services
├── Dockerfile # Builds the FastAPI application container
├── prometheus.yml # Prometheus scrape configuration
├── requirements.txt # Python dependencies
├── metrics_cheatsheet.md # Quick PromQL reference
├── fastapi_metric_explained.md # Detailed metric explanations
└── README.md # Setup and usage instructions

Technology Stack

Application Layer:

Python 3.11
FastAPI 0.115.0
Uvicorn (ASGI server)

Observability Layer:

prometheus-client 0.20.0 (Python Prometheus client library)
prometheus-fastapi-instrumentator 6.1.0 (automatic FastAPI instrumentation)

Infrastructure:

Docker & Docker Compose
Prometheus (latest)
Grafana (latest)

Code Deep Dive

Application Entry Point: `app/main.py`

Let’s break down the FastAPI application code section by section:

1. Imports and Dependencies

from fastapi import FastAPI, HTTPException
from contextlib import asynccontextmanager
from prometheus_client import Counter, Histogram
from prometheus_fastapi_instrumentator import Instrumentator
from typing import List, Dict
import logging
import random
import time

Key Components:

prometheus_client: Provides Counter and Histogram for custom metrics
prometheus_fastapi_instrumentator: Automatically instruments FastAPI routes
asynccontextmanager: Manages application lifecycle (startup/shutdown)

2. Logging Configuration

logging.basicConfig(
 level=logging.INFO,
 format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger("fastapi-observability")

Purpose: Structured logging for application events. While this lab focuses on metrics, logging complements observability.

log from fast-api container

2025-12-03 06:28:13,111 - fastapi-observability - INFO - Creating todo with title=todo20
INFO: 192.168.176.1:54714 - "POST /todos?title=todo20 HTTP/1.1" 200 OK
INFO: 192.168.176.2:44812 - "GET /metrics HTTP/1.1" 200 OK
2025-12-03 06:28:24,979 - fastapi-observability - INFO - Fetching all todos
INFO: 192.168.176.1:38270 - "GET /todos HTTP/1.1" 200 OK
INFO: 192.168.176.2:38074 - "GET /metrics HTTP/1.1" 200 OK

Purpose: Structured logging for application events. While this lab focuses on metrics, logging complements observability.

3. Custom Application Metrics

REQUEST_COUNTER = Counter(
 "app_requests_total",
 "Total number of processed requests in FastAPI app",
 ["endpoint", "method", "http_status"],
)

REQUEST_LATENCY = Histogram(
 "app_request_latency_seconds",
 "Latency of FastAPI requests in seconds",
 ["endpoint", "method"],
)

Explanation:

REQUEST_COUNTER:

Type: Counter (monotonically increasing)
Metric Name: app_requests_total
Labels: endpoint, method, http_status
Purpose: Track total requests per endpoint, method, and status code
Example: app_requests_total{endpoint="/todos", method="GET", http_status="200"} 125

REQUEST_LATENCY:

Type: Histogram (bucketed distribution)
Metric Name: app_request_latency_seconds
Labels: endpoint, method
Purpose: Measure request duration distribution
Creates: _bucket, _count, _sum metrics automatically

4. Application Lifespan Management

@asynccontextmanager
async def lifespan(app: FastAPI):
 logger.info("🚀 FastAPI application starting up")
 yield
 logger.info("🛑 FastAPI application shutting down")

Purpose: Lifecycle hooks for startup and shutdown operations. Useful for:

Database connections
Background tasks
Resource cleanup

5. FastAPI App Initialization

app = FastAPI(
 title="FastAPI Observability Lab",
 description="FastAPI app instrumented with Prometheus & Grafana",
 version="1.0.0",
 lifespan=lifespan,
)

Purpose: Creates the FastAPI application instance with metadata.

6. Automatic Instrumentation

Instrumentator().instrument(app).expose(app, endpoint="/metrics")

What This Does:

The prometheus-fastapi-instrumentator library automatically:

Wraps all route handlers to capture:

Request count
Request duration
Status codes
HTTP methods

Exposes /metrics endpoint that returns Prometheus-formatted metrics:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{handler="/",method="GET",status="2xx"} 10.0
http_requests_total{handler="/todos",method="GET",status="2xx"} 5.0
...

Creates standard metrics:

http_requests_total (counter)
http_request_duration_seconds_bucket (histogram buckets)
http_request_duration_seconds_count (histogram count)
http_request_duration_seconds_sum (histogram sum)

7. Request Timer Helper Class

class RequestTimer:
 def __init__(self, endpoint: str, method: str):
 self.endpoint = endpoint
 self.method = method
 self.start = None

 def __enter__(self):
 self.start = time.time()
 return self

 def __exit__(self, exc_type, exc_val, exc_tb):
 elapsed = time.time() - self.start
 REQUEST_LATENCY.labels(endpoint=self.endpoint, method=self.method).observe(elapsed)

Purpose: Context manager for manual latency measurement. Uses Python’s context manager protocol (__enter__/__exit__) to automatically record request duration.

Usage Pattern:

with RequestTimer(endpoint, method):
 # ... route handler code ...

8. Route Handlers

GET / – Health Check

@app.get("/")
async def read_root():
 endpoint = "/"
 method = "GET"
 with RequestTimer(endpoint, method):
 logger.info("Root endpoint accessed")
 REQUEST_COUNTER.labels(
 endpoint=endpoint,
 method=method,
 http_status="200",
 ).inc()
 return {"status": "healthy"}

log response from app container

2025-12-03 06:45:34,070 - fastapi-observability - INFO - Root endpoint accessed
INFO: 192.168.176.1:35520 - "GET / HTTP/1.1" 200 OK

What Happens:

Timer starts
Logs access
Increments custom counter
Returns response
Timer records latency

GET /todos – List Todos

@app.get("/todos")
async def get_todos():
 endpoint = "/todos"
 method = "GET"
 with RequestTimer(endpoint, method):
 logger.info("Fetching all todos")
 # simulate random latency between 100ms and 400ms
 time.sleep(random.uniform(0.1, 0.4))
 
 REQUEST_COUNTER.labels(
 endpoint=endpoint,
 method=method,
 http_status="200",
 ).inc()
 return TODOS

Key Feature: Artificial latency (time.sleep) to make metrics visible in dashboards.

POST /todos – Create Todo

@app.post("/todos")
async def create_todo(title: str):
 endpoint = "/todos"
 method = "POST"
 with RequestTimer(endpoint, method):
 logger.info("Creating todo with title=%s", title)
 
 if not title:
 REQUEST_COUNTER.labels(
 endpoint=endpoint,
 method=method,
 http_status="400",
 ).inc()
 raise HTTPException(status_code=400, detail="Title cannot be empty")
 
 # ... create todo ...
 
 REQUEST_COUNTER.labels(
 endpoint=endpoint,
 method=method,
 http_status="201",
 ).inc()
 return todo

Key Feature: Different status codes (400 for validation errors, 201 for success).

for empty title

FastAPI Observability Lab with Prometheus and Grafana: Complete Guide

GET /error – Error Endpoint

@app.get("/error")
async def trigger_error():
 endpoint = "/error"
 method = "GET"
 with RequestTimer(endpoint, method):
 logger.error("Simulated error endpoint accessed")
 REQUEST_COUNTER.labels(
 endpoint=endpoint,
 method=method,
 http_status="500",
 ).inc()
 raise HTTPException(status_code=500, detail="Simulated error for observability lab")

Purpose: Intentionally generates 5xx errors for testing error rate monitoring.

error log view from container

2025-12-03 06:55:19,465 - fastapi-observability - ERROR - Simulated error endpoint accessed
INFO: 192.168.176.1:44198 - "GET /error HTTP/1.1" 500 Internal Server Error
2025-12-03 06:55:21,984 - fastapi-observability - ERROR - Simulated error endpoint accessed
INFO: 192.168.176.1:44198 - "GET /error HTTP/1.1" 500 Internal Server Error
2025-12-03 06:55:22,838 - fastapi-observability - ERROR - Simulated error endpoint accessed

Docker Configuration

Dockerfile

FROM python:3.11-slim

ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
 curl \
 && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app ./app

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Key Points:

Uses Python 3.11 slim image (minimal size)
Sets environment variables for Python behavior
Installs dependencies from requirements.txt
Runs Uvicorn ASGI server on port 8000

docker-compose.yml

version: "3.8"

services:
 app:
 build: .
 container_name: fastapi-observability-app
 ports:
 - "8000:8000"
 networks:
 - monitoring
 environment:
 - PYTHONUNBUFFERED=1

 prometheus:
 image: prom/prometheus:latest
 container_name: fastapi-observability-prometheus
 volumes:
 - ./prometheus.yml:/etc/prometheus/prometheus.yml
 ports:
 - "9090:9090"
 networks:
 - monitoring

 grafana:
 image: grafana/grafana:latest
 container_name: fastapi-observability-grafana
 ports:
 - "3030:3000"
 environment:
 - GF_SECURITY_ADMIN_USER=admin
 - GF_SECURITY_ADMIN_PASSWORD=admin
 volumes:
 - grafana-storage:/var/lib/grafana
 depends_on:
 - prometheus
 networks:
 - monitoring

networks:
 monitoring:
 driver: bridge

volumes:
 grafana-storage:

Architecture Decisions:

Shared Network (monitoring): All services can communicate by service name
Volume Mounts:

Prometheus config file mounted from host
Grafana data persisted in named volume

3. Port Mappings:

App: 8000 (host) → 8000 (container)
Prometheus: 9090 (host) → 9090 (container)
Grafana: 3030 (host) → 3000 (container, Grafana default)

Prometheus Operations

What is Prometheus?

Prometheus is an open-source monitoring and alerting toolkit. It:

Pulls metrics from targets (HTTP endpoints)
Stores time-series data in its own database
Provides PromQL query language for data analysis
Supports alerting based on metric thresholds

Prometheus Configuration: `prometheus.yml`

global:
 scrape_interval: 15s

scrape_configs:
 - job_name: "fastapi"
 metrics_path: "/metrics"
 static_configs:
 - targets: ["app:8000"]

Configuration Breakdown:

global.scrape_interval: 15s

Prometheus scrapes metrics every 15 seconds
Balance between freshness and resource usage

job_name: "fastapi"

Logical grouping of targets
Appears in metric labels as job="fastapi"

metrics_path: "/metrics"

HTTP path to scrape
Default is /metrics (Prometheus standard)

targets: ["app:8000"]

Service name from docker-compose network
Prometheus will scrape http://app:8000/metrics

How Prometheus Scrapes Metrics

Scrape Process:

Prometheus sends HTTP GET to http://app:8000/metrics
FastAPI responds with Prometheus-formatted text:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{handler="/",method="GET",status="2xx"} 10.0
http_requests_total{handler="/todos",method="GET",status="2xx"} 5.0
...

Prometheus parses the response
Metrics stored in time-series database
Process repeats every 15 seconds

Prometheus UI Operations

Access: http://localhost:9090

Key Features:

Graph Tab

Enter PromQL queries
Visualize metrics over time
Example: rate(http_requests_total[5m])

2. Status → Targets

View scrape target health
Should show fastapi job as UP
Shows last scrape time and errors

3. Status → Configuration

View loaded configuration
Verify scrape settings

4. Alerts Tab

View active alerts (if configured)
Alert rules defined in separate config

PromQL Basics

PromQL (Prometheus Query Language) is used to query metrics:

Counter Rate:

rate(http_requests_total[5m])

Converts counter to requests per second
[5m] = time window (5 minutes)

Aggregation:

sum by (handler) (rate(http_requests_total[5m]))

Groups by handler label
Sums rates for each endpoint

Histogram Quantile:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

Calculates 95th percentile latency
le = "less than or equal" (bucket boundaries)

Grafana Operations

What is Grafana?

Grafana is an open-source analytics and visualization platform. It:

Connects to data sources (Prometheus, databases, APIs)
Creates dashboards with panels and visualizations
Supports alerting based on queries
Provides rich visualization options (graphs, tables, gauges, etc.)

Initial Setup

Access: http://localhost:3030

Default Credentials:

Username: admin
Password: admin
(Change on first login)

Configuring Prometheus Data Source

Steps:

Navigate to Connections → Data sources
Click Add data source
Select Prometheus
Set URL: http://prometheus:9090

Uses Docker service name (not localhost)

5. Click Save & test

Should show: “Successfully queried the Prometheus API.”

6. Why http://prometheus:9090?

Services in Docker Compose can resolve each other by service name
prometheus resolves to the Prometheus container
Port 9090 is Prometheus’s default port

Creating Dashboards

Dashboard Structure:

Dashboard: Collection of panels
Panel: Single visualization (graph, table, stat, etc.)
Query: PromQL expression that fetches data

Example Dashboard Creation

1. Create New Dashboard:

Click Dashboards → New → New dashboard

2. Add Panel:

Click Add visualization
Select data source: Prometheus

3. Configure Query:

Switch to Code mode (for PromQL)
Enter query: rate(http_requests_total[5m])

Set Format to Time series

4. Configure Visualization:

Panel type: Time series
Title: “Requests per Second”
Y-axis label: “req/s”

5. Save Dashboard:

Click Save dashboard
Give it a name: “FastAPI Observability”

Common Dashboard Panels

1. Requests per Second (Time Series)

Query:

sum by (handler) (rate(http_requests_total[5m]))

Visualization: Time series graph Purpose: Shows traffic patterns over time per endpoint.

2. Status Code Breakdown (Pie Chart)

Query:

sum by (status) (rate(http_requests_total[5m]))

Visualization: Pie chart Purpose: Visual distribution of 2xx, 4xx, 5xx responses

3. Error Rate Percentage (Stat)

Query:

100 * sum(rate(http_requests_total{status="5xx"}[5m])) / sum(rate(http_requests_total[5m]))

Visualization: Stat panel with gauge Purpose: Single number showing error percentage

4. P95 Latency (Time Series)

Query:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

Visualization: Time series graph Purpose: 95th percentile latency over time

5. Request Count by Endpoint (Bar Chart)

Query:

sum by (handler) (increase(http_requests_total[5m]))

Visualization: Bar chart Purpose: Total requests per endpoint in last 5 minutes

Grafana Best Practices

Use meaningful panel titles
Add descriptions explaining what each panel shows
Set appropriate time ranges (last 1 hour, 6 hours, 24 hours)
Use variables for dynamic dashboards (e.g., endpoint selector)
Export dashboards as JSON for version control

FastAPI Metrics: Complete Reference

This section provides a comprehensive explanation of all metrics exposed by the FastAPI application.

Metric Categories

The application exposes metrics in two categories:

Automatic Metrics (from prometheus-fastapi-instrumentator)
Custom Metrics (manually defined in code)

Automatic Metrics (from Instrumentator)

These metrics are automatically created by the prometheus-fastapi-instrumentator library.

1. `http_requests_total`

Type: Counter
Description: Total number of HTTP requests processed since application startup
Labels:

handler: HTTP route path (e.g., /, /todos, /error)
method: HTTP method (GET, POST, PUT, DELETE, etc.)
status: HTTP status code group (2xx, 4xx, 5xx)
instance: Target instance identifier
job: Prometheus job name (fastapi)

Example Metric:

http_requests_total{handler="/todos",method="GET",status="2xx",instance="app:8000",job="fastapi"} 125.0

Interpretation:

125 GET requests to /todos returned 2xx status codes
Counter only increases (never decreases)
Resets to 0 on application restart

Use Cases:

Total request count (not useful for graphs directly)
Calculate rates: rate(http_requests_total[5m])
Calculate increases: increase(http_requests_total[5m])

Common Queries:

Requests per second (all):

rate(http_requests_total[5m])

Requests per second by endpoint:

sum by (handler) (rate(http_requests_total[5m]))

Requests per second by status:

sum by (status) (rate(http_requests_total[5m]))

Total requests in last 5 minutes:

sum(increase(http_requests_total[5m]))

2. `http_request_duration_seconds_bucket`

Type: Histogram (bucket)
Description: Count of requests that completed within specific latency buckets
Labels:

handler: HTTP route path
method: HTTP method
le: "less than or equal" (bucket boundary in seconds)
instance: Target instance
job: Prometheus job name

Example Metrics:

http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.005"} 10.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.01"} 25.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.025"} 50.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.05"} 75.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.1"} 100.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.25"} 120.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.5"} 125.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="1.0"} 125.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="2.5"} 125.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="5.0"} 125.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="10.0"} 125.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="+Inf"} 125.0

Interpretation:

le="0.1" bucket = 100 requests completed in ≤ 0.1 seconds
le="+Inf" bucket = total requests (125)
Buckets are cumulative (each includes previous buckets)

Use Cases:

Calculate percentiles (P50, P95, P99)
Understand latency distribution
Identify slow endpoints

Common Queries:

P50 (median) latency:

histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

P95 latency:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

P95 latency by endpoint:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (handler, le))

3. `http_request_duration_seconds_count`

Type: Counter (derived from histogram)
Description: Total number of requests (same as _bucket{le="+Inf"})
Labels:

handler: HTTP route path
method: HTTP method
instance: Target instance
job: Prometheus job name

Example Metric:

http_request_duration_seconds_count{handler="/todos",method="GET",instance="app:8000",job="fastapi"} 125.0

Use Cases:

Total request count (alternative to http_requests_total)
Calculate average latency (with _sum)

Common Queries:

Average latency:

rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])

4. `http_request_duration_seconds_sum`

Type: Counter
Description: Sum of all request durations (in seconds)
Labels:

handler: HTTP route path
method: HTTP method
instance: Target instance
job: Prometheus job name

Example Metric:

http_request_duration_seconds_sum{handler="/todos",method="GET",instance="app:8000",job="fastapi"} 12.5

Interpretation:

Total time spent processing 125 requests = 12.5 seconds
Average = 12.5 / 125 = 0.1 seconds per request

Use Cases:

Calculate average latency
Calculate total time spent

Common Queries:

Average latency (seconds):

rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])

Average latency by endpoint:

sum by (handler) (rate(http_request_duration_seconds_sum[5m])) / sum by (handler) (rate(http_request_duration_seconds_count[5m]))

Custom Metrics (Manual)

These metrics are manually defined in the application code.

5. `app_requests_total`

Type: Counter
Description: Total number of requests tracked by application code
Labels:

endpoint: HTTP route path
method: HTTP method
http_status: HTTP status code (200, 201, 400, 500, etc.)

Example Metric:

app_requests_total{endpoint="/todos",method="GET",http_status="200"} 100.0
app_requests_total{endpoint="/todos",method="POST",http_status="201"} 25.0
app_requests_total{endpoint="/error",method="GET",http_status="500"} 5.0

Differences from http_requests_total:

Uses endpoint instead of handler label
Uses http_status with exact codes (200, 201, 400, 500) instead of groups (2xx, 4xx, 5xx)
Manually incremented in code (more control)

Use Cases:

Application-level request tracking
Status code-specific monitoring
Custom business logic metrics

Common Queries:

Requests per second by endpoint:

sum by (endpoint) (rate(app_requests_total[5m]))

Requests per second by status code:

sum by (http_status) (rate(app_requests_total[5m]))

6. `app_request_latency_seconds`

Type: Histogram
Description: Request latency measured manually in application code
Labels:

endpoint: HTTP route path
method: HTTP method

Creates Three Metrics:

app_request_latency_seconds_bucket (histogram buckets)
app_request_latency_seconds_count (total count)
app_request_latency_seconds_sum (sum of durations)

Example Metrics:

app_request_latency_seconds_bucket{endpoint="/todos",method="GET",le="0.1"} 50.0
app_request_latency_seconds_bucket{endpoint="/todos",method="GET",le="0.5"} 100.0
app_request_latency_seconds_count{endpoint="/todos",method="GET"} 100.0
app_request_latency_seconds_sum{endpoint="/todos",method="GET"} 25.0

Differences from http_request_duration_seconds:

Uses endpoint instead of handler label
Manually measured (more control over measurement points)
Can measure specific code sections

Use Cases:

Custom latency tracking
Measuring specific code paths
Comparing with automatic metrics

Common Queries:

P95 latency (custom):

histogram_quantile(0.95, sum(rate(app_request_latency_seconds_bucket[5m])) by (le))

Average latency (custom):

rate(app_request_latency_seconds_sum[5m]) / rate(app_request_latency_seconds_count[5m])

Metrics Counting Process

How Metrics Are Collected and Counted

Understanding the complete flow of how metrics are generated, collected, and stored:

Step 1: Request Arrives at FastAPI

Client → HTTP Request → FastAPI Application

Example:

GET http://localhost:8000/todos

Step 2: Instrumentator Intercepts Request

The prometheus-fastapi-instrumentator middleware:

Records start time: start_time = time.time()
Extracts metadata:

Route path: /todos
HTTP method: GET

3. Waits for response

Step 3: Route Handler Executes

@app.get("/todos")
async def get_todos():
 # Custom metrics code executes:
 with RequestTimer(endpoint, method): # Start timer
 # ... handler logic ...
 REQUEST_COUNTER.labels(...).inc() # Increment counter
 return TODOS
 # Timer ends, records latency

What Happens:

RequestTimer.__enter__() records start time
Handler executes (may include time.sleep() for simulation)
REQUEST_COUNTER increments with labels
RequestTimer.__exit__() calculates elapsed time and records in histogram

Step 4: Response Sent

FastAPI sends HTTP response:

Status code: 200
Body: JSON array of todos

Step 5: Instrumentator Records Metrics

After response, instrumentator:

Calculates duration: duration = time.time() - start_time
Determines status group: 200 → 2xx
Increments counter

http_requests_total{handler="/todos", method="GET", status="2xx"} += 1

4. Records latency in histogram:

http_request_duration_seconds_bucket{handler="/todos", method="GET", le="0.1"} += 1
http_request_duration_seconds_bucket{handler="/todos", method="GET", le="0.5"} += 1
...

(All buckets >= duration are incremented)

Step 6: Metrics Exposed via `/metrics` Endpoint

When Prometheus scrapes http://app:8000/metrics, FastAPI returns:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{handler="/",method="GET",status="2xx"} 10.0
http_requests_total{handler="/todos",method="GET",status="2xx"} 5.0
http_requests_total{handler="/todos",method="POST",status="201"} 2.0
http_requests_total{handler="/error",method="GET",status="5xx"} 1.0

# HELP http_request_duration_seconds Request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.005"} 0.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.01"} 0.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.025"} 0.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.05"} 0.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.1"} 2.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.25"} 3.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="0.5"} 5.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="1.0"} 5.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="2.5"} 5.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="5.0"} 5.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="10.0"} 5.0
http_request_duration_seconds_bucket{handler="/todos",method="GET",le="+Inf"} 5.0
http_request_duration_seconds_count{handler="/todos",method="GET"} 5.0
http_request_duration_seconds_sum{handler="/todos",method="GET"} 0.75

# Custom metrics
# HELP app_requests_total Total number of processed requests in FastAPI app
# TYPE app_requests_total counter
app_requests_total{endpoint="/todos",method="GET",http_status="200"} 5.0

# HELP app_request_latency_seconds Latency of FastAPI requests in seconds
# TYPE app_request_latency_seconds histogram
app_request_latency_seconds_bucket{endpoint="/todos",method="GET",le="0.1"} 2.0
app_request_latency_seconds_bucket{endpoint="/todos",method="GET",le="0.5"} 5.0
...
app_request_latency_seconds_count{endpoint="/todos",method="GET"} 5.0
app_request_latency_seconds_sum{endpoint="/todos",method="GET"} 0.75

Step 7: Prometheus Scrapes and Stores

Prometheus:

Sends HTTP GET to /metrics every 15 seconds
Parses response (Prometheus text format)
Stores time-series data:

Timestamp: 2024-01-15T10:30:00Z
Metric: http_requests_total{handler="/todos",...}
Value: 5.0

4. Indexes by labels for fast queries

Step 8: Grafana Queries Prometheus

When we create a Grafana panel with query:

rate(http_requests_total[5m])

Process:

Grafana sends PromQL query to Prometheus API
Prometheus:

Retrieves time-series data for last 5 minutes
Calculates rate: (current_value - old_value) / time_delta
Returns result

3. Grafana visualizes result in panel

Counter Behavior

Counters are monotonically increasing:

Time Value
10:00 0
10:01 5 (+5 requests)
10:02 12 (+7 requests)
10:03 20 (+8 requests)
10:04 25 (+5 requests)

To get rate (requests per second):

rate(http_requests_total[5m])

Calculation:

At 10:04: (25 - 0) / 240 seconds = 0.104 req/s
Uses sliding window (last 5 minutes)

To get increase (total requests in window):

increase(http_requests_total[5m])

Calculation:

At 10:04: 25 - 0 = 25 requests (over 5 minutes)

Histogram Behavior

Histograms track distribution:

For a request that took 0.15 seconds:

Bucket (le) Count Before Count After
0.005 0 0
0.01 0 0
0.025 0 0
0.05 0 0
0.1 0 0
0.25 0 1 ← Request fits here
0.5 0 1
1.0 0 1
...
+Inf 0 1

All buckets >= observed value are incremented.

To calculate percentiles:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

Process:

Calculate rate for each bucket
Find bucket where 95% of requests fall
Interpolate within bucket
Return latency value

Practical Examples

Example 1: Monitoring Traffic Patterns

Scenario: You want to see which endpoints receive the most traffic.

Grafana Panel Setup:

Query:

sum by (handler) (rate(http_requests_total[5m]))

2. Visualization: Time series graph

3. Title: “Requests per Second by Endpoint”

Result: Line graph showing traffic per endpoint over time.

Example 2: Detecting Error Spikes

Scenario: You want to be alerted when error rate exceeds 5%.

Grafana Panel Setup:

Query:

100 * sum(rate(http_requests_total{status="5xx"}[5m])) / sum(rate(http_requests_total[5m]))

2. Visualization: Stat panel with threshold

3. Thresholds:

Green: < 1%
Yellow: 1–5%
Red: > 5%

Result: Single number showing error percentage with color coding.

Example 3: Performance Monitoring

Scenario: You want to track P95 latency to identify slow endpoints.

Grafana Panel Setup:

Query

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (handler, le))

2. Visualization: Time series graph

3. Y-axis unit: seconds

4. Title: “P95 Latency by Endpoint”

Example 4: Comparing Custom vs Automatic Metrics

Scenario: You want to verify that custom metrics match automatic metrics.

Grafana Panel Setup:

Panel 1 — Automatic:

sum(rate(http_requests_total[5m]))

2. Panel 2 — Custom

sum(rate(app_requests_total[5m]))

Result: Two numbers that should be similar (may differ slightly due to timing).

Conclusion

This FastAPI Observability Lab provides a complete, production-ready example of:

Instrumenting FastAPI applications with Prometheus metrics
Configuring Prometheus to scrape and store metrics
Building Grafana dashboards to visualize application health
Understanding metric types (counters, histograms) and their use cases
Writing PromQL queries for common monitoring scenarios

Key Takeaways

Metrics are essential for understanding application behavior
Prometheus provides powerful querying capabilities
Grafana makes metrics accessible through visualizations
Both automatic and custom metrics have their place
Percentiles (P95, P99) are crucial for performance monitoring

Next Steps

Add more endpoints and observe how metrics change
Create alerting rules in Prometheus for error rates and latency
Export Grafana dashboards as JSON for version control
Add business metrics (e.g., todos created, users active)
Integrate with logging and tracing for complete observability

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

FastAPI Observability Lab with Prometheus and Grafana: Complete Guide

Author(s): Faizulkhan

Table of Contents

Lab Overview & Concept

What is This Lab?

Learning Objectives

Lab Concept: The Three Pillars of Observability

Why FastAPI + Prometheus + Grafana?

Project Architecture

System Components

Project Structure

Technology Stack

Code Deep Dive

Application Entry Point: app/main.py

1. Imports and Dependencies

Key Components:

2. Logging Configuration

3. Custom Application Metrics

4. Application Lifespan Management

Purpose: Lifecycle hooks for startup and shutdown operations. Useful for:

5. FastAPI App Initialization

Purpose: Creates the FastAPI application instance with metadata.

6. Automatic Instrumentation

7. Request Timer Helper Class

8. Route Handlers

Docker Configuration

Dockerfile

Key Points:

docker-compose.yml

Prometheus Operations

What is Prometheus?

Prometheus Configuration: prometheus.yml

Configuration Breakdown:

How Prometheus Scrapes Metrics

Prometheus UI Operations

PromQL Basics

Grafana Operations

What is Grafana?

Initial Setup

Configuring Prometheus Data Source

Creating Dashboards

Example Dashboard Creation

Common Dashboard Panels

1. Requests per Second (Time Series)

2. Status Code Breakdown (Pie Chart)

3. Error Rate Percentage (Stat)

4. P95 Latency (Time Series)

5. Request Count by Endpoint (Bar Chart)

Grafana Best Practices

FastAPI Metrics: Complete Reference

Metric Categories

Automatic Metrics (from Instrumentator)

1. http_requests_total

2. http_request_duration_seconds_bucket

3. http_request_duration_seconds_count

4. http_request_duration_seconds_sum

Custom Metrics (Manual)

5. app_requests_total

6. app_request_latency_seconds

Metrics Counting Process

How Metrics Are Collected and Counted

Step 1: Request Arrives at FastAPI

Example:

Step 2: Instrumentator Intercepts Request

Step 3: Route Handler Executes

What Happens:

Step 4: Response Sent

Step 5: Instrumentator Records Metrics

Step 6: Metrics Exposed via /metrics Endpoint

Step 7: Prometheus Scrapes and Stores

Step 8: Grafana Queries Prometheus

Counter Behavior

Histogram Behavior

Practical Examples

Example 1: Monitoring Traffic Patterns

Example 2: Detecting Error Spikes

Example 3: Performance Monitoring

Example 4: Comparing Custom vs Automatic Metrics

Application Entry Point: `app/main.py`

Prometheus Configuration: `prometheus.yml`

1. `http_requests_total`

2. `http_request_duration_seconds_bucket`

3. `http_request_duration_seconds_count`

4. `http_request_duration_seconds_sum`

5. `app_requests_total`

6. `app_request_latency_seconds`

Step 6: Metrics Exposed via `/metrics` Endpoint