Top 10 SQL Queries a Data Scientist should know
Last Updated on September 28, 2022 by Editorial Team
Author(s): Saurabh Saxena
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
Top 10 SQL Queries a Data Scientist ShouldΒ Know
Structured Query Language or SQL is a query-based universal language to read, write and manage databases. In any Machine Learning pipeline, whether it is data, metadata, or logs, SQL is widely used in all operations.
When you are working with a database, the four basic operations any must know is CRUD (Create, Read, Update, and Delete). However, we will target read, update, and delete operations in thisΒ blog.
Letβs Start from theΒ basics.
1) SELECT
If you want to know the first_name, last_name, and email of the customers, you can specify the list of the desired columns along with the SELECT and FROM keywords.
SELECT first_name,
last_name,
FROM customer
Whereas, to query all the columns in the table, then asterisks(*) is the one that goes with SELECTΒ .
SELECT *
FROM customer
2) DISTINCT
DISTINCT restricts the duplication of rows in the query. Example: Query to print the creation date of the customers.
SELECT DISTINCT create_date
FROM customer
3) WHERE
WHERE is used in a query to filter the result. Example: Query all the addresses in the Texas district.
SELECT * FROM address
WHERE district='Texas'
4) GROUP BY andΒ HAVING
GROUP BY clause clusters the rows with the same values. Example: What is the total payment made by each customer?
SELECT customer_id,
SUM(amount) AS total_amount
FROM payment
GROUP BY customer_id
In SQL, aggregation functions such as SUM, AVG, andCOUNT can not be used in the WHERE clause. We need to use the HAVINGΒ clause.
SELECT customer_id,
SUM(amount) AS total_amount
FROM payment
GROUP BY customer_id
HAVING SUM(amount) <=50
5) ORDER BY andΒ LIMIT
ORDER BY sorts the results in ascending or descending order based on certain columns, ASC and DESC keywords decide the order of the sort. Example: Find the total payment made by each customer and sort in ascending order.
SELECT customer_id,
SUM(amount) AS total_amount
FROM payment
GROUP BY customer_id
ORDER BY total_amount ASC
LIMIT restricts the number of rows to a specified number in the result. Example: Find the top 3 highest paying customers.
SELECT customer_id,
SUM(amount) AS total_amount
FROM payment
GROUP BY customer_id
ORDER BY total_amount DESC
LIMIT 3
Note: Default Order By clause sorts the result in ASCENDING order. Few SQL providers support TOP n keywords to limit the number of rows in theΒ result.
6) CASE
The CASE expression goes through conditions and returns a value when the first condition is met. Example: A customer is a premium customer if the amount exceedsΒ 50.
SELECT customer_id,
SUM(amount) AS total_amount,
CASE
WHEN SUM(amount)>=50 THEN 'Premium Customer'
ELSE 'Standard Customer'
END AS customer_status
FROM payment
GROUP BY customer_id
ORDER BY total_amount ASC
7) JOINS
INNER JOIN that will result in the common rows between twoΒ tables.
LEFT JOINreturns all rows from the left table and the matching rows from the right table. If no matching rows are found in the right table, NULL is used. RIGHT JOIN is a vice-versa of leftΒ join.
FULL JOIN is a combination of left and right join. If no matching rows are found in the left or right table, NULL isΒ used.
SELECT c.first_name,
c.last_name,
c.customer_id,
SUM(p.amount)
FROM customer c
INNER JOIN payment p
ON c.customer_id = p.customer_id
GROUP BY c.first_name,
c.last_name,
c.customer_id
SELECT c.first_name,
c.last_name,
c.customer_id,
SUM(p.amount)
FROM customer c
LEFT JOIN payment p
ON c.customer_id = p.customer_id
GROUP BY c.first_name,
c.last_name,
c.customer_id
8) Subqueries
A subquery is a SQL query nested inside a larger query.Β Example:
SELECT payment_id,
amount,
(SELECT SUM(amount) FROM payment) AS total_amount
FROM payment
9) Windows Function withΒ Rank
Window functions apply to aggregate and ranking functions over a particular window (set of rows). OVER clause is used with window functions to define that window. It can be combined with PARTITION BY and ORDER BYΒ .
SELECT customer_id,
SUM(amount) AS total_amount,
RANK() OVER (ORDER BY SUM(amount) DESC)
FROM payment
GROUP BY customer_id
10) INSERT, UPDATE, DELETE andΒ TRUNCATE
As the name suggests INSERT is used to push one or more records into the table, while UPDATEis used to modify values in the table based on certain conditions provided by WHEREΒ clause.
DELETE and TRUNCATE both are used to remove records from the table where DELETE removed the record based on WHERE condition and TRUNCATE removed all the records in anyΒ table.
Note: We have used PostgreSQL, which is an open-source Relational DBMS, pgAdmin as a Client. Queries ran on βdvdrentalβ database, and I have mentioned all the resources about the database and PostgreSQL in the reference section.
References:
[1] PostgreSQL Official Page. https://www.postgresql.org/
[2] pgAdmin Official Page. https://www.pgadmin.org/
[3] PostgreSQL docker Image. https://hub.docker.com/_/postgres
[4] DVD Rental DB. https://www.postgresqltutorial.com/postgresql-getting-started/load-postgresql-sample-database/
Top 10 SQL Queries a Data Scientist should know was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Itβs free, we donβt spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI