Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
SQL for Data Analysts:
Data Science   Latest   Machine Learning

SQL for Data Analysts:

Last Updated on December 29, 2025 by Editorial Team

Author(s): Gitanjali

Originally published on Towards AI.

SQL for Data Analysts:

SQL powers most data analyst roles today. If you work with data, chances are you will spend a large part of your day writing SQL queries.

SQL allows you to work directly with data stored in databases like PostgreSQL, MySQL, BigQuery, or SQL Server. For many common analytical tasks. especially when datasets are not extremely large, SQL is often faster and more efficient than loading data into Python and using Pandas.

This guide is not about learning every SQL command. It focuses on the queries analysts actually use daily. The kind of queries used to filter data, join tables, calculate metrics, and answer business questions.

We will cover around 20 essential SQL queries, using realistic sales and e-commerce style datasets. Each query includes clear examples you can copy, run, and practice. These are the same patterns used in real analyst jobs.

You can practice along using free tools like DB-Fiddle, the BigQuery public sandbox, or a local MySQL/PostgreSQL setup. A simple sample sales database will be used throughout the guide so everything feels practical, not theoretical.

Writing SQL regularly matters. Analysts who practice querying data daily become faster at analysis, debugging, and problem-solving, and that skill directly translates into better job opportunities and stronger portfolios.​

Sample Database Setup (Run this first in your SQL editor):

CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(50),
city VARCHAR(30),
join_date DATE
);
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
product VARCHAR(50),
amount DECIMAL(10,2),
order_date DATE
);
CREATE TABLE products (
product_id INT PRIMARY KEY,
product VARCHAR(50),
category VARCHAR(30),
price DECIMAL(10,2)
);
INSERT INTO customers VALUES
(1,'Alice','Delhi', '2024-01-15'),
(2,'Bob','Mumbai','2024-02-20'),
(3,'Charlie','Delhi','2024-03-10');
INSERT INTO orders VALUES
(101,1,'Laptop',15000,'2024-06-01'),
(102,1,'Mouse',500,'2024-06-05'),
(103,2,'Laptop',15000,'2024-06-10'),
(104,3,'Phone',20000,'2024-06-15'),
(105,1,'Phone',20000,'2024-07-01');

Core Queries: SELECT, WHERE, ORDER BY, LIMIT

1. Basic SELECT Grab columns or everything.

SELECT name, city FROM customers; -- Specific columns
SELECT * FROM orders WHERE amount > 10000 ORDER BY order_date DESC LIMIT 3;

Output: Top 3 high-value orders, newest first. Pitfall: * slows on big tables-name columns. Analyst Use: Quick data previews.kdnuggets+1

2. Filtering with WHERE AND/OR/NOT/IN/BETWEEN/LIKE.

SELECT * FROM orders 
WHERE customer_id IN (1,2) AND amount BETWEEN 500 AND 15000
AND product LIKE 'Laptop%'; -- Wildcards: % any, _ one char

Pro Tip: >= dates: WHERE order_date >= '2024-06-01'. NULLs: IS NULL not = NULL.linkedin

3. Sorting & Limiting — ORDER BY ASC/DESC, LIMIT/TOP (SQL Server).

SELECT customer_id, SUM(amount) AS total_spent
FROM orders GROUP BY customer_id
ORDER BY total_spent DESC LIMIT 2;

Aliases (AS) clean output for reports/export to Excel/Python.

Aggregations: GROUP BY, HAVING, COUNT/SUM/AVG

4. GROUP BY Basics -Summarize by category.

SELECT city, COUNT(*) AS customers, AVG(amount) AS avg_order
FROM customers c JOIN orders o ON c.customer_id = o.customer_id
GROUP BY city;

Why HAVING not WHERE? WHERE filters rows pre-group; HAVING post.linkedin+1

5. HAVING for Aggregates -Filter groups.

SELECT customer_id, COUNT(order_id) AS orders
FROM orders
GROUP BY customer_id
HAVING COUNT(order_id) > 1 AND SUM(amount) > 20000;

Business: High-value repeat buyers for marketing.

Joins: The 80% Daily Workhorse

Data lives in tables — JOIN glues them. Key Types:

| Join Type | What It Keeps | Common Use Case | Visual Idea |
|------------------|----------------------------------------|------------------------------------------------------|-------------|
| INNER JOIN | Only matching rows from both tables | Customers who placed orders | Overlap |
| LEFT JOIN | All rows from left + matches from right| All customers and their orders (NULL if none) | Left full |
| RIGHT JOIN | All rows from right + matches from left| All products and their sales (NULL if none) | Right full |
| FULL OUTER JOIN | All rows from both tables | Combining two datasets with partial overlap | Both full |

If you want it to hit harder (and readers actually remember it), right after the table, add one clean

6. INNER JOIN -Most common.

SELECT c.name, o.product, o.amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

7. LEFT JOIN: All customers, even no orders.

SELECT c.name, COUNT(o.order_id) AS orders_placed
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

Pitfall: Multiple matches inflate COUNT-use DISTINCT.abeltavares.hashnode+1

8. Multi-Table + Self-Join-Products to orders.

SELECT o.product, p.category, SUM(o.amount) AS revenue
FROM orders o
JOIN products p ON o.product = p.product
GROUP BY o.product, p.category;

Subqueries & CTEs: Break Complex into Simple

9. Subquery: Query inside query.

SELECT name FROM customers 
WHERE customer_id IN (
SELECT customer_id FROM orders
WHERE amount > 15000
);

Performance: Correlated (runs per row) slow, avoid on big data.youtube​

10. CTE (WITH) -Named temp result, readable.

WITH high_spenders AS (
SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id HAVING total > 20000
)
SELECT c.name, hs.total FROM high_spenders hs
JOIN customers c ON hs.customer_id = c.customer_id;

Recursive CTE: Hierarchies (org charts): WITH RECURSIVE.abeltavares.hashnode+1

Window Functions: Rankings Without Messy Self-Joins

11. ROW_NUMBER/RANK/LAG/LEAD -Analytics gold.

SELECT 
customer_id,
amount,
order_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS order_seq,
LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_amount
FROM orders;
  • ROW_NUMBER: Unique seq.
  • RANK: Ties share (1,1,3).
  • LAG: Previous row value (YoY growth).abeltavares.hashnode

12. Running Totals & Percentiles.

SELECT 
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date ROWS UNBOUNDED PRECEDING) AS running_total
FROM orders;

String/Date Functions: Clean Real Data

13. Strings — UPPER/LOWER/LEN/SUBSTRING.

SELECT 
UPPER(name) AS name_upper,
LENGTH(product) AS name_len,
SUBSTRING(product, 1, 3) AS short_prod -- MySQL; SUBSTR PostgreSQL
FROM customers c JOIN orders o ON c.customer_id = o.customer_id;

14. Dates — Essential for analysts.

SELECT 
order_date,
DAY(order_date) AS day,
MONTHNAME(order_date) AS month, -- MySQL
DATEDIFF('2024-12-27', order_date) AS days_ago
FROM orders;

Group by Month: GROUP BY YEAR(order_date), MONTH(order_date).geeksforgeeks

Advanced: CASE, Unions, Views

15. CASE -If-then in SQL.

SELECT 
name,
CASE
WHEN total > 20000 THEN 'VIP'
WHEN total > 10000 THEN 'Regular'
ELSE 'Newbie'
END AS segment
FROM (/* your CTE */) t;

16. UNION Stack results (same columns).

SELECT name FROM customers 
UNION
SELECT product FROM products;

17. CREATE VIEW — Saved query.

CREATE VIEW monthly_sales AS
SELECT MONTH(order_date) AS month, SUM(amount) AS revenue
FROM orders GROUP BY month;
-- Use: SELECT * FROM monthly_sales;

Performance & Best Practices

  • Indexes: CREATE INDEX idx_customer ON orders(customer_id); – Speeds WHERE/JOIN.
  • EXPLAIN: Prefix queries to see plan.
  • **Avoid SELECT *** — Name columns.
  • BigQuery Tip: Partition tables by date.
    Common Errors:
    | Error | Fix |
    | — — — -| — — -|
    | ORA-00904 invalid identifier | Column name wrong |
    | Syntax error near ‘)’ | Missing comma |
    | Too slow | Add LIMIT, indexes, or CTE over subquery |

Portfolio Projects:

  1. Sales Dashboard Query: Joins + Window for top customers by revenue rank.
  2. Churn Analysis: Customers with no orders last 90 days (DATE_SUB).
  3. A/B Test: GROUP BY variant, AVG metrics, t-test ready export.
  4. Python Tie-In: Query to CSV → pd.read_csv() for viz.

Query Kaggle datasets daily. Master these 17, and you’re interview-ready.

  1. https://www.geeksforgeeks.org/sql/sql-data-analysis/
  2. https://www.linkedin.com/pulse/copy-top-8-sql-queries-every-data-analyst-should-know-examples-pi9jf
  3. https://www.kdnuggets.com/sql-for-data-analysts-essential-queries-for-data-extraction-transformation
  4. https://www.linkedin.com/pulse/sql-queries-youll-use-real-world-analytics-jobs-mryuf

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.