SQL for Data Analysts:

Last Updated on December 29, 2025 by Editorial Team

Author(s): Gitanjali

Originally published on Towards AI.

SQL powers most data analyst roles today. If you work with data, chances are you will spend a large part of your day writing SQL queries.

SQL allows you to work directly with data stored in databases like PostgreSQL, MySQL, BigQuery, or SQL Server. For many common analytical tasks. especially when datasets are not extremely large, SQL is often faster and more efficient than loading data into Python and using Pandas.

This guide is not about learning every SQL command. It focuses on the queries analysts actually use daily. The kind of queries used to filter data, join tables, calculate metrics, and answer business questions.

We will cover around 20 essential SQL queries, using realistic sales and e-commerce style datasets. Each query includes clear examples you can copy, run, and practice. These are the same patterns used in real analyst jobs.

You can practice along using free tools like DB-Fiddle, the BigQuery public sandbox, or a local MySQL/PostgreSQL setup. A simple sample sales database will be used throughout the guide so everything feels practical, not theoretical.

Writing SQL regularly matters. Analysts who practice querying data daily become faster at analysis, debugging, and problem-solving, and that skill directly translates into better job opportunities and stronger portfolios.

Sample Database Setup (Run this first in your SQL editor):

CREATE TABLE customers (
 customer_id INT PRIMARY KEY,
 name VARCHAR(50),
 city VARCHAR(30),
 join_date DATE
);
CREATE TABLE orders (
 order_id INT PRIMARY KEY,
 customer_id INT,
 product VARCHAR(50),
 amount DECIMAL(10,2),
 order_date DATE
);
CREATE TABLE products (
 product_id INT PRIMARY KEY,
 product VARCHAR(50),
 category VARCHAR(30),
 price DECIMAL(10,2)
);
INSERT INTO customers VALUES 
(1,'Alice','Delhi', '2024-01-15'),
(2,'Bob','Mumbai','2024-02-20'),
(3,'Charlie','Delhi','2024-03-10');
INSERT INTO orders VALUES 
(101,1,'Laptop',15000,'2024-06-01'),
(102,1,'Mouse',500,'2024-06-05'),
(103,2,'Laptop',15000,'2024-06-10'),
(104,3,'Phone',20000,'2024-06-15'),
(105,1,'Phone',20000,'2024-07-01');

Core Queries: SELECT, WHERE, ORDER BY, LIMIT

1. Basic SELECT Grab columns or everything.

SELECT name, city FROM customers; -- Specific columns
SELECT * FROM orders WHERE amount > 10000 ORDER BY order_date DESC LIMIT 3;

Output: Top 3 high-value orders, newest first. Pitfall: * slows on big tables-name columns. Analyst Use: Quick data previews.kdnuggets+1

2. Filtering with WHERE AND/OR/NOT/IN/BETWEEN/LIKE.

SELECT * FROM orders 
WHERE customer_id IN (1,2) AND amount BETWEEN 500 AND 15000 
AND product LIKE 'Laptop%'; -- Wildcards: % any, _ one char

Pro Tip: >= dates: WHERE order_date >= '2024-06-01'. NULLs: IS NULL not = NULL.linkedin

3. Sorting & Limiting — ORDER BY ASC/DESC, LIMIT/TOP (SQL Server).

SELECT customer_id, SUM(amount) AS total_spent
FROM orders GROUP BY customer_id 
ORDER BY total_spent DESC LIMIT 2;

Aliases (AS) clean output for reports/export to Excel/Python.

Aggregations: GROUP BY, HAVING, COUNT/SUM/AVG

4. GROUP BY Basics -Summarize by category.

SELECT city, COUNT(*) AS customers, AVG(amount) AS avg_order
FROM customers c JOIN orders o ON c.customer_id = o.customer_id
GROUP BY city;

Why HAVING not WHERE? WHERE filters rows pre-group; HAVING post.linkedin+1

5. HAVING for Aggregates -Filter groups.

SELECT customer_id, COUNT(order_id) AS orders
FROM orders 
GROUP BY customer_id 
HAVING COUNT(order_id) > 1 AND SUM(amount) > 20000;

Business: High-value repeat buyers for marketing.

Joins: The 80% Daily Workhorse

Data lives in tables — JOIN glues them. Key Types:

| Join Type | What It Keeps | Common Use Case | Visual Idea |
|------------------|----------------------------------------|------------------------------------------------------|-------------|
| INNER JOIN | Only matching rows from both tables | Customers who placed orders | Overlap |
| LEFT JOIN | All rows from left + matches from right| All customers and their orders (NULL if none) | Left full |
| RIGHT JOIN | All rows from right + matches from left| All products and their sales (NULL if none) | Right full |
| FULL OUTER JOIN | All rows from both tables | Combining two datasets with partial overlap | Both full |

If you want it to hit harder (and readers actually remember it), right after the table, add one clean

6. INNER JOIN -Most common.

SELECT c.name, o.product, o.amount
FROM customers c 
INNER JOIN orders o ON c.customer_id = o.customer_id;

7. LEFT JOIN: All customers, even no orders.

SELECT c.name, COUNT(o.order_id) AS orders_placed
FROM customers c 
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

Pitfall: Multiple matches inflate COUNT-use DISTINCT.abeltavares.hashnode+1

8. Multi-Table + Self-Join-Products to orders.

SELECT o.product, p.category, SUM(o.amount) AS revenue
FROM orders o 
JOIN products p ON o.product = p.product
GROUP BY o.product, p.category;

Subqueries & CTEs: Break Complex into Simple

9. Subquery: Query inside query.

SELECT name FROM customers 
WHERE customer_id IN (
 SELECT customer_id FROM orders 
 WHERE amount > 15000
);

Performance: Correlated (runs per row) slow, avoid on big data.youtube

10. CTE (WITH) -Named temp result, readable.

WITH high_spenders AS (
 SELECT customer_id, SUM(amount) AS total
 FROM orders GROUP BY customer_id HAVING total > 20000
)
SELECT c.name, hs.total FROM high_spenders hs
JOIN customers c ON hs.customer_id = c.customer_id;

Recursive CTE: Hierarchies (org charts): WITH RECURSIVE.abeltavares.hashnode+1

Window Functions: Rankings Without Messy Self-Joins

11. ROW_NUMBER/RANK/LAG/LEAD -Analytics gold.

SELECT 
 customer_id,
 amount,
 order_date,
 ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS order_seq,
 LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_amount
FROM orders;

ROW_NUMBER: Unique seq.
RANK: Ties share (1,1,3).
LAG: Previous row value (YoY growth).abeltavares.hashnode

12. Running Totals & Percentiles.

SELECT 
 order_date,
 amount,
 SUM(amount) OVER (ORDER BY order_date ROWS UNBOUNDED PRECEDING) AS running_total
FROM orders;

String/Date Functions: Clean Real Data

13. Strings — UPPER/LOWER/LEN/SUBSTRING.

SELECT 
 UPPER(name) AS name_upper,
 LENGTH(product) AS name_len,
 SUBSTRING(product, 1, 3) AS short_prod -- MySQL; SUBSTR PostgreSQL
FROM customers c JOIN orders o ON c.customer_id = o.customer_id;

14. Dates — Essential for analysts.

SELECT 
 order_date,
 DAY(order_date) AS day,
 MONTHNAME(order_date) AS month, -- MySQL
 DATEDIFF('2024-12-27', order_date) AS days_ago
FROM orders;

Group by Month: GROUP BY YEAR(order_date), MONTH(order_date).geeksforgeeks

Advanced: CASE, Unions, Views

15. CASE -If-then in SQL.

SELECT 
 name,
 CASE 
 WHEN total > 20000 THEN 'VIP'
 WHEN total > 10000 THEN 'Regular'
 ELSE 'Newbie'
 END AS segment
FROM (/* your CTE */) t;

16. UNION Stack results (same columns).

SELECT name FROM customers 
UNION 
SELECT product FROM products;

17. CREATE VIEW — Saved query.

CREATE VIEW monthly_sales AS
SELECT MONTH(order_date) AS month, SUM(amount) AS revenue
FROM orders GROUP BY month;
-- Use: SELECT * FROM monthly_sales;

Performance & Best Practices

Indexes: CREATE INDEX idx_customer ON orders(customer_id); – Speeds WHERE/JOIN.
EXPLAIN: Prefix queries to see plan.
**Avoid SELECT *** — Name columns.
BigQuery Tip: Partition tables by date.
Common Errors:
| Error | Fix |
| — — — -| — — -|
| ORA-00904 invalid identifier | Column name wrong |
| Syntax error near ‘)’ | Missing comma |
| Too slow | Add LIMIT, indexes, or CTE over subquery |

Portfolio Projects:

Sales Dashboard Query: Joins + Window for top customers by revenue rank.
Churn Analysis: Customers with no orders last 90 days (DATE_SUB).
A/B Test: GROUP BY variant, AVG metrics, t-test ready export.
Python Tie-In: Query to CSV → pd.read_csv() for viz.

Query Kaggle datasets daily. Master these 17, and you’re interview-ready.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

SQL for Data Analysts:

Author(s): Gitanjali

Core Queries: SELECT, WHERE, ORDER BY, LIMIT

Aggregations: GROUP BY, HAVING, COUNT/SUM/AVG

Joins: The 80% Daily Workhorse

Subqueries & CTEs: Break Complex into Simple

Window Functions: Rankings Without Messy Self-Joins

String/Date Functions: Clean Real Data

Advanced: CASE, Unions, Views

Performance & Best Practices

Portfolio Projects:

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

SQL for Data Analysts:

Author(s): Gitanjali

Core Queries: SELECT, WHERE, ORDER BY, LIMIT

Aggregations: GROUP BY, HAVING, COUNT/SUM/AVG

Joins: The 80% Daily Workhorse

Subqueries & CTEs: Break Complex into Simple

Window Functions: Rankings Without Messy Self-Joins

String/Date Functions: Clean Real Data

Advanced: CASE, Unions, Views

Performance & Best Practices

Portfolio Projects:

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement