Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

SQL for Data Scientists/Analysts
Latest

SQL for Data Scientists/Analysts

Last Updated on August 6, 2021 by Editorial Team

Author(s): Saniya Parveez

Introduction

SQL is an important part of data and data science for the storage and retrieval of data. With extensive data being gathered and churned out every day in the industries, as long as the data remains in a SQL-compliant database, SQL is still the ablest tool to help to investigate, filter, and aggregate data to get a thorough understanding ofΒ data.

I have seen many times that SQL is an underrated skill for data science because it has been taken for granted as a necessary yet uncool way of obtaining data out from the database to feed into pandas. People think that SQL is just SELECT, JOIN, and ORDER BY. But, it is way more than just SELECT, JOIN, and ORDER By statements.

SQL Tricks for EveryΒ Day

COALESCE() Function

This function is used to handle NULL values. It replaces NULL values with another desiredΒ value.

Let's take an EmployeeΒ table:

Figure 1: City with NullΒ values

In the above table, the city row has many null values. Let’s query the Null values of the city with a defaultΒ value.

SELECT name, salary, city,
COALESCE(city, 'New Delhi') AS city_null_value
FROM
Employee
ORDER BY name;
Figure 2: Default value of city that has a NULLΒ value

ROW_NUMBER() function

This function creates a unique incrementing integer value to each row of the result. This column of values is supposed pseudo-column as it does not naturally exist in our data table. Because of this, the result is returned in the order determined by the analysts in the ORDER BYΒ clause.

Let’s create a rownumber in the EmployeeΒ table.

SELECT 
rownumber,
name,
salary,
city
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY city ORDER BY name DESC) AS rownumber,
tab.*
FROM Employee tab
) dat
ORDER BY name, rownumber;
Figure 3: row number based on the partitioned city

WITH Statement

It is used to define β€œstatement scoped views”. These are not stored in the database schema. It is also called Common Table Expression (CTE) and subquery factoring.

Let’s take the OrderDetail table asΒ below:

Figure 4: OrderDetail Table

Now, query to return the average quantity ordered per ProductId.

WITH cte_quantity
AS
(SELECT
SUM(Quantity) as Total
FROM OrderDetails
GROUP BY ProductID)

SELECT
AVG(Total) average_product_quantity
FROM cte_quantity;

Output:

Figure 5: Average quantity ordered per productId

GROUP_CONCAT() function

It is used to concatenate data from multiple rows into one field. It returns a string with a concatenated non-NULL value from a group. It returns NULL when there are no non-NULLΒ values.

Let’s take an example of the β€œBook master” table asΒ below:

Figure 6: Book MasterΒ Table

Query to return a list of comma(,) separated β€˜cate_id’s for each group of β€˜pub_id’.

SELECT pub_id,GROUP_CONCAT(cate_id)
FROM book_mast
GROUP BY pub_id;
Figure 7: Comma Separated value

Integrate SQL Query withΒ Python

Panda is a wonderful library that gives one line code to query from SQL. Below code to query fromΒ Pandas:

query = "SELECT * FROM CURRENT_TABLE"
sql_data = pandas.read_sql(query, connection)

Conclusion

Data scientists or analysts should understand SQL. In fact, all professionals working with data and analytics should know SQL. SQL is still the most powerful tool to help you investigate, filter, and aggregate to get a thorough understanding of yourΒ data.


SQL for Data Scientists/Analysts was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓