SQL for Data Analysts:
Last Updated on December 29, 2025 by Editorial Team
Author(s): Gitanjali
Originally published on Towards AI.

SQL powers most data analyst roles today. If you work with data, chances are you will spend a large part of your day writing SQL queries.
SQL allows you to work directly with data stored in databases like PostgreSQL, MySQL, BigQuery, or SQL Server. For many common analytical tasks. especially when datasets are not extremely large, SQL is often faster and more efficient than loading data into Python and using Pandas.
This guide is not about learning every SQL command. It focuses on the queries analysts actually use daily. The kind of queries used to filter data, join tables, calculate metrics, and answer business questions.
We will cover around 20 essential SQL queries, using realistic sales and e-commerce style datasets. Each query includes clear examples you can copy, run, and practice. These are the same patterns used in real analyst jobs.
You can practice along using free tools like DB-Fiddle, the BigQuery public sandbox, or a local MySQL/PostgreSQL setup. A simple sample sales database will be used throughout the guide so everything feels practical, not theoretical.
Writing SQL regularly matters. Analysts who practice querying data daily become faster at analysis, debugging, and problem-solving, and that skill directly translates into better job opportunities and stronger portfolios.
Sample Database Setup (Run this first in your SQL editor):
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(50),
city VARCHAR(30),
join_date DATE
);
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
product VARCHAR(50),
amount DECIMAL(10,2),
order_date DATE
);
CREATE TABLE products (
product_id INT PRIMARY KEY,
product VARCHAR(50),
category VARCHAR(30),
price DECIMAL(10,2)
);
INSERT INTO customers VALUES
(1,'Alice','Delhi', '2024-01-15'),
(2,'Bob','Mumbai','2024-02-20'),
(3,'Charlie','Delhi','2024-03-10');
INSERT INTO orders VALUES
(101,1,'Laptop',15000,'2024-06-01'),
(102,1,'Mouse',500,'2024-06-05'),
(103,2,'Laptop',15000,'2024-06-10'),
(104,3,'Phone',20000,'2024-06-15'),
(105,1,'Phone',20000,'2024-07-01');
Core Queries: SELECT, WHERE, ORDER BY, LIMIT
1. Basic SELECT Grab columns or everything.
SELECT name, city FROM customers; -- Specific columns
SELECT * FROM orders WHERE amount > 10000 ORDER BY order_date DESC LIMIT 3;
Output: Top 3 high-value orders, newest first. Pitfall: * slows on big tables-name columns. Analyst Use: Quick data previews.kdnuggets+1
2. Filtering with WHERE AND/OR/NOT/IN/BETWEEN/LIKE.
SELECT * FROM orders
WHERE customer_id IN (1,2) AND amount BETWEEN 500 AND 15000
AND product LIKE 'Laptop%'; -- Wildcards: % any, _ one char
Pro Tip: >= dates: WHERE order_date >= '2024-06-01'. NULLs: IS NULL not = NULL.linkedin
3. Sorting & Limiting — ORDER BY ASC/DESC, LIMIT/TOP (SQL Server).
SELECT customer_id, SUM(amount) AS total_spent
FROM orders GROUP BY customer_id
ORDER BY total_spent DESC LIMIT 2;
Aliases (AS) clean output for reports/export to Excel/Python.
Aggregations: GROUP BY, HAVING, COUNT/SUM/AVG
4. GROUP BY Basics -Summarize by category.
SELECT city, COUNT(*) AS customers, AVG(amount) AS avg_order
FROM customers c JOIN orders o ON c.customer_id = o.customer_id
GROUP BY city;
Why HAVING not WHERE? WHERE filters rows pre-group; HAVING post.linkedin+1
5. HAVING for Aggregates -Filter groups.
SELECT customer_id, COUNT(order_id) AS orders
FROM orders
GROUP BY customer_id
HAVING COUNT(order_id) > 1 AND SUM(amount) > 20000;
Business: High-value repeat buyers for marketing.
Joins: The 80% Daily Workhorse
Data lives in tables — JOIN glues them. Key Types:
| Join Type | What It Keeps | Common Use Case | Visual Idea |
|------------------|----------------------------------------|------------------------------------------------------|-------------|
| INNER JOIN | Only matching rows from both tables | Customers who placed orders | Overlap |
| LEFT JOIN | All rows from left + matches from right| All customers and their orders (NULL if none) | Left full |
| RIGHT JOIN | All rows from right + matches from left| All products and their sales (NULL if none) | Right full |
| FULL OUTER JOIN | All rows from both tables | Combining two datasets with partial overlap | Both full |
If you want it to hit harder (and readers actually remember it), right after the table, add one clean
6. INNER JOIN -Most common.
SELECT c.name, o.product, o.amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
7. LEFT JOIN: All customers, even no orders.
SELECT c.name, COUNT(o.order_id) AS orders_placed
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;
Pitfall: Multiple matches inflate COUNT-use DISTINCT.abeltavares.hashnode+1
8. Multi-Table + Self-Join-Products to orders.
SELECT o.product, p.category, SUM(o.amount) AS revenue
FROM orders o
JOIN products p ON o.product = p.product
GROUP BY o.product, p.category;
Subqueries & CTEs: Break Complex into Simple
9. Subquery: Query inside query.
SELECT name FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders
WHERE amount > 15000
);
Performance: Correlated (runs per row) slow, avoid on big data.youtube
10. CTE (WITH) -Named temp result, readable.
WITH high_spenders AS (
SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id HAVING total > 20000
)
SELECT c.name, hs.total FROM high_spenders hs
JOIN customers c ON hs.customer_id = c.customer_id;
Recursive CTE: Hierarchies (org charts): WITH RECURSIVE.abeltavares.hashnode+1
Window Functions: Rankings Without Messy Self-Joins
11. ROW_NUMBER/RANK/LAG/LEAD -Analytics gold.
SELECT
customer_id,
amount,
order_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS order_seq,
LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_amount
FROM orders;
- ROW_NUMBER: Unique seq.
- RANK: Ties share (1,1,3).
- LAG: Previous row value (YoY growth).abeltavares.hashnode
12. Running Totals & Percentiles.
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date ROWS UNBOUNDED PRECEDING) AS running_total
FROM orders;
String/Date Functions: Clean Real Data
13. Strings — UPPER/LOWER/LEN/SUBSTRING.
SELECT
UPPER(name) AS name_upper,
LENGTH(product) AS name_len,
SUBSTRING(product, 1, 3) AS short_prod -- MySQL; SUBSTR PostgreSQL
FROM customers c JOIN orders o ON c.customer_id = o.customer_id;
14. Dates — Essential for analysts.
SELECT
order_date,
DAY(order_date) AS day,
MONTHNAME(order_date) AS month, -- MySQL
DATEDIFF('2024-12-27', order_date) AS days_ago
FROM orders;
Group by Month: GROUP BY YEAR(order_date), MONTH(order_date).geeksforgeeks
Advanced: CASE, Unions, Views
15. CASE -If-then in SQL.
SELECT
name,
CASE
WHEN total > 20000 THEN 'VIP'
WHEN total > 10000 THEN 'Regular'
ELSE 'Newbie'
END AS segment
FROM (/* your CTE */) t;
16. UNION Stack results (same columns).
SELECT name FROM customers
UNION
SELECT product FROM products;
17. CREATE VIEW — Saved query.
CREATE VIEW monthly_sales AS
SELECT MONTH(order_date) AS month, SUM(amount) AS revenue
FROM orders GROUP BY month;
-- Use: SELECT * FROM monthly_sales;
Performance & Best Practices
- Indexes:
CREATE INDEX idx_customer ON orders(customer_id);– Speeds WHERE/JOIN. - EXPLAIN: Prefix queries to see plan.
- **Avoid SELECT *** — Name columns.
- BigQuery Tip: Partition tables by date.
Common Errors:
| Error | Fix |
| — — — -| — — -|
| ORA-00904 invalid identifier | Column name wrong |
| Syntax error near ‘)’ | Missing comma |
| Too slow | Add LIMIT, indexes, or CTE over subquery |
Portfolio Projects:
- Sales Dashboard Query: Joins + Window for top customers by revenue rank.
- Churn Analysis: Customers with no orders last 90 days (DATE_SUB).
- A/B Test: GROUP BY variant, AVG metrics, t-test ready export.
- Python Tie-In: Query to CSV →
pd.read_csv()for viz.
Query Kaggle datasets daily. Master these 17, and you’re interview-ready.
- https://www.geeksforgeeks.org/sql/sql-data-analysis/
- https://www.linkedin.com/pulse/copy-top-8-sql-queries-every-data-analyst-should-know-examples-pi9jf
- https://www.kdnuggets.com/sql-for-data-analysts-essential-queries-for-data-extraction-transformation
- https://www.linkedin.com/pulse/sql-queries-youll-use-real-world-analytics-jobs-mryuf
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI Academy
We Build Enterprise-Grade AI. We'll Teach You to Master It Too.
15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.
Start free — no commitment:
→ 6-Day Agentic AI Engineering Email Guide — one practical lesson per day
→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages
Our courses:
→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.
→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.
→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.
Note: Article content contains the views of the contributing authors and not Towards AI.