Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Advanced SQL for Data Analysis —Part 1: Subqueries and CTE
Data Analysis   Data Science   Latest   Machine Learning

Advanced SQL for Data Analysis —Part 1: Subqueries and CTE

Last Updated on May 1, 2024 by Editorial Team

Author(s): Carlos da Costa

Originally published on Towards AI.

Photo by Luke Chesser on Unsplash

As query complexity increases, there is a growing need to keep the code organized, modular, and readable. Therefore, it is important to know techniques to maintain code organization. This skill will make you stand out from other analysts. Two tools in SQL are specifically designed for this purpose: subqueries and CTEs. In this tutorial, we will explore these two advanced SQL techniques for data analysis.

Without further ado, let’s get straight to it.

Get the SQL roadmap for free!

SQL: Data Science and Analytics Roadmap

Do you ever wonder what you have to learn to start data analysis with SQL? On this roadmap, there is the 20% of SQL you…

daviddacosta.gumroad.com

Subqueries

Subqueries are queries nested inside another query, where the result of the inner query can be used by the outer query. It can be used in SELECT, INSERT, UPDATE, or DELETE statements.

Image by the author

Subquery in the SELECT statement

Since most of the data analysis in SQL is performed within the SELECT statement, we will start by looking at the application of subqueries in the SELECT statement.

Subquery as one of the columns

Subqueries results can be used as column values in a SELECT statement. In this example, we use a subquery to find the percentage of quantity of orders by customers.

SELECT customer_name, 
AVG(
quantity /
(
SELECT SUM(quantity)
FROM orders
) * 100
) AS "Percentage of order"
FROM orders
GROUP BY 1;

Subquery in the FROM clause

A subquery in a FROM clause behaves like a temporary table or view, they are mostly used to:

  • Perform complex data manipulations or aggregation before joining with other tables.
  • Break down complex logic into smaller parts.
  • Generate temporary results sets.

The subquery in the FROM clause typically appears within parentheses and is aliased with a name

select SUM(male) - SUM(female) AS gender_difference from 
(
SELECT
case when customer_gender = 'Male' THEN 1 ELSE 0 END AS male,
case when customer_gender = 'Female' THEN 1 ELSE 0 END AS female
FROM customer
) AS gender

In the example above, we have used a subquery to find the gender difference in the customer table. In the inner query, we assigned zero and ones to the male and female columns, and in the outer query, we calculated the difference between the total number of males and females.

Subquery in the WHERE and HAVING clause

Subqueries are commonly used to filter the results of the main query. To filter the results, they are placed in the WHERE or HAVING clause.

In the example below, we will use subquery to print out all the products that cost less than the average price.

SELECT * FROM product
WHERE price >= (
SELECT AVG(price) )
FROM product
)

Since we used the WHERE in the example, please add in the comment section an example using the HAVING clause.

Subquery in INSERT statements

Subqueries can also be used within INSERT statements, this is useful when there is a need to insert data into a table from a result of another query.

INSERT INTO orders(customer_name, quantity)
SELECT customer_name, 1 FROM customer WHERE customer_gender = 'Male';

In the example above, we used subquery to insert data into the orders table from the customer table.

Subquey in the UPDATE statement

Similarly to INSERT statements, subqueries can also be used in the UPDATE statement. In the example below, we used it to update the customer name in the orders table for orders with a quantity equal to one, based on the information in the customer table.

UPDATE orders
SET customer_name = (
SELECT customer_name FROM customer WHERE customer_id = 'C10'
)
WHERE quantity = 1;

Subquey in the DELETE statement

And finally, subqueries can also be used in the DELETE statement, so that we can delete rows based on the results of another query.

In the example below we will delete all the orders of customers that are not active.

DELETE FROM orders
WHERE customer_id IN (
SELECT customer_id
FROM customers
WHERE status = 'inactive'
);

Common Table Expressions

Sometimes SQL queries can grow complex becoming challenging to maintain and understand, that is where common table expression also known as CTE comes in. CTEs are temporary result sets defined within the scope of a single SQL statement.

CTE helps to write more readable and modular SQL queries, breaking down complex queries into smaller ones.

Image by the author

A CTE is defined by using the WITH keyword and then the CTE name, followed by an optional list of column names for the CTE, and finally the AS keyword, followed by the query that defines the CTE. Once defined, it can be referenced just like a regular table.

Benefinefits of using CTEs:

  • Readability: Make complex queries easier to read.
  • Modularity: CTEs are defined once and can be referenced multiple times within the same query.
  • Recursive queries: CTEs allow recursive operations.
  • Performance optimization: CTEs can help improve query performance.

In the next example, we will use a CTE to create a separate table containing cleaned data. Let’s say we have a product table where the price column is stored as a text datatype with a dollar sign in each value. To address this, we create a CTE to cleanse the data, removing the dollar signs and converting the price to a decimal format. By using this CTE, we ensure that every calculation involving the price column is performed in the cleaned data, eliminating the need to repeat the cleansing process.

WITH cleansed_product
AS (
SELECT
product_id,
category_id,
product_name,
cast(REPLACE(price,'$','') AS DECIMAL(10,2)) AS price
FROM product
)

SELECT * FROM cleansed_product

Additionally, CTEs are invaluable tools when working with partitions and window functions, enabling complex analytical queries to be expressed in a more readable and modular manner. We’ll delve deeper into these advanced techniques in Part Two of this article.

Final Thoughts

Well, this is just the beginning of advanced SQL for data analysis, in the next article we will look into more techniques that you can use to enhance your data analysis skills.

Hey, Carlos here!

👏 Did you find this tutorial helpful? Please consider Clapping and Follow me to show your support!

🤝 Let’s connect! Join me on LinkedIn for opportunities to collaborate on exciting projects and stay updated on the latest trends in the data world.

🌟 Your feedback and engagement mean the world to me. Thank you for being part of this journey!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓