Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Top 10 SQL Queries a Data Scientist should know
Latest

Top 10 SQL Queries a Data Scientist should know

Last Updated on September 28, 2022 by Editorial Team

Author(s): Saurabh Saxena

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Top 10 SQL Queries a Data Scientist Should Know

Photo by Campaign Creators on Unsplash

Structured Query Language or SQL is a query-based universal language to read, write and manage databases. In any Machine Learning pipeline, whether it is data, metadata, or logs, SQL is widely used in all operations.

When you are working with a database, the four basic operations any must know is CRUD (Create, Read, Update, and Delete). However, we will target read, update, and delete operations in this blog.

Let’s Start from the basics.

1) SELECT

If you want to know the first_name, last_name, and email of the customers, you can specify the list of the desired columns along with the SELECT and FROM keywords.

SELECT first_name,
last_name,
email
FROM customer
Image by Author

Whereas, to query all the columns in the table, then asterisks(*) is the one that goes with SELECT .

SELECT * 
FROM customer
Image by Author

2) DISTINCT

DISTINCT restricts the duplication of rows in the query. Example: Query to print the creation date of the customers.

SELECT DISTINCT create_date
FROM customer
Image by Author

3) WHERE

WHERE is used in a query to filter the result. Example: Query all the addresses in the Texas district.

SELECT * FROM address
WHERE district='Texas'
Image by Author

4) GROUP BY and HAVING

GROUP BY clause clusters the rows with the same values. Example: What is the total payment made by each customer?

SELECT customer_id,
SUM(amount) AS total_amount
FROM payment
GROUP BY customer_id
Image by Author

In SQL, aggregation functions such as SUM, AVG, andCOUNT can not be used in the WHERE clause. We need to use the HAVING clause.

SELECT customer_id,
SUM(amount) AS total_amount
FROM payment
GROUP BY customer_id
HAVING SUM(amount) <=50
Image by Author

5) ORDER BY and LIMIT

ORDER BY sorts the results in ascending or descending order based on certain columns, ASC and DESC keywords decide the order of the sort. Example: Find the total payment made by each customer and sort in ascending order.

SELECT customer_id,
SUM(amount) AS total_amount
FROM payment
GROUP BY customer_id
ORDER BY total_amount ASC
Image by Author

LIMIT restricts the number of rows to a specified number in the result. Example: Find the top 3 highest paying customers.

SELECT customer_id, 
SUM(amount) AS total_amount
FROM payment
GROUP BY customer_id
ORDER BY total_amount DESC
LIMIT 3
Image by Author

Note: Default Order By clause sorts the result in ASCENDING order. Few SQL providers support TOP n keywords to limit the number of rows in the result.

6) CASE

The CASE expression goes through conditions and returns a value when the first condition is met. Example: A customer is a premium customer if the amount exceeds 50.

Image by Author
SELECT customer_id,
SUM(amount) AS total_amount,
CASE
WHEN SUM(amount)>=50 THEN 'Premium Customer'
ELSE 'Standard Customer'
END AS customer_status
FROM payment
GROUP BY customer_id
ORDER BY total_amount ASC
Image by Author

7) JOINS

INNER JOIN that will result in the common rows between two tables.

LEFT JOINreturns all rows from the left table and the matching rows from the right table. If no matching rows are found in the right table, NULL is used. RIGHT JOIN is a vice-versa of left join.

FULL JOIN is a combination of left and right join. If no matching rows are found in the left or right table, NULL is used.

Image by Author
SELECT c.first_name,
c.last_name,
c.customer_id,
SUM(p.amount)
FROM customer c
INNER JOIN payment p
ON c.customer_id = p.customer_id
GROUP BY c.first_name,
c.last_name,
c.customer_id
Image by Author
SELECT c.first_name,
c.last_name,
c.customer_id,
SUM(p.amount)
FROM customer c
LEFT JOIN payment p
ON c.customer_id = p.customer_id
GROUP BY c.first_name,
c.last_name,
c.customer_id
Image by Author

8) Subqueries

A subquery is a SQL query nested inside a larger query. Example:

SELECT payment_id,
amount,
(SELECT SUM(amount) FROM payment) AS total_amount
FROM payment
Image by Author

9) Windows Function with Rank

Window functions apply to aggregate and ranking functions over a particular window (set of rows). OVER clause is used with window functions to define that window. It can be combined with PARTITION BY and ORDER BY .

SELECT customer_id, 
SUM(amount) AS total_amount,
RANK() OVER (ORDER BY SUM(amount) DESC)
FROM payment
GROUP BY customer_id
Image by Author

10) INSERT, UPDATE, DELETE and TRUNCATE

As the name suggests INSERT is used to push one or more records into the table, while UPDATEis used to modify values in the table based on certain conditions provided by WHERE clause.

DELETE and TRUNCATE both are used to remove records from the table where DELETE removed the record based on WHERE condition and TRUNCATE removed all the records in any table.

Note: We have used PostgreSQL, which is an open-source Relational DBMS, pgAdmin as a Client. Queries ran on ‘dvdrental’ database, and I have mentioned all the resources about the database and PostgreSQL in the reference section.

References:

[1] PostgreSQL Official Page. https://www.postgresql.org/

[2] pgAdmin Official Page. https://www.pgadmin.org/

[3] PostgreSQL docker Image. https://hub.docker.com/_/postgres

[4] DVD Rental DB. https://www.postgresqltutorial.com/postgresql-getting-started/load-postgresql-sample-database/


Top 10 SQL Queries a Data Scientist should know was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓