SQL for Data Scientists/Analysts

Last Updated on August 6, 2021 by Editorial Team

Author(s): Saniya Parveez

Introduction

SQL is an important part of data and data science for the storage and retrieval of data. With extensive data being gathered and churned out every day in the industries, as long as the data remains in a SQL-compliant database, SQL is still the ablest tool to help to investigate, filter, and aggregate data to get a thorough understanding of data.

I have seen many times that SQL is an underrated skill for data science because it has been taken for granted as a necessary yet uncool way of obtaining data out from the database to feed into pandas. People think that SQL is just SELECT, JOIN, and ORDER BY. But, it is way more than just SELECT, JOIN, and ORDER By statements.

SQL Tricks for Every Day

COALESCE() Function

This function is used to handle NULL values. It replaces NULL values with another desired value.

Let's take an Employee table:

In the above table, the city row has many null values. Let’s query the Null values of the city with a default value.

SELECT name, salary, city,
COALESCE(city, 'New Delhi') AS city_null_value
FROM  
Employee    
ORDER BY name;

Figure 2: Default value of city that has a NULL value

ROW_NUMBER() function

This function creates a unique incrementing integer value to each row of the result. This column of values is supposed pseudo-column as it does not naturally exist in our data table. Because of this, the result is returned in the order determined by the analysts in the ORDER BY clause.

Let’s create a rownumber in the Employee table.

SELECT 
  rownumber, 
  name, 
  salary,
  city
FROM 
(
  SELECT 
    ROW_NUMBER() OVER (PARTITION BY city ORDER BY name DESC) AS rownumber, 
    tab.*
  FROM Employee tab    
)    dat
ORDER BY name, rownumber;

Figure 3: row number based on the partitioned city

WITH Statement

It is used to define “statement scoped views”. These are not stored in the database schema. It is also called Common Table Expression (CTE) and subquery factoring.

Let’s take the OrderDetail table as below:

Now, query to return the average quantity ordered per ProductId.

WITH cte_quantity
AS
(SELECT
    SUM(Quantity) as Total
FROM OrderDetails
GROUP BY ProductID)
 
SELECT
    AVG(Total) average_product_quantity
FROM cte_quantity;

Output:

Figure 5: Average quantity ordered per productId

GROUP_CONCAT() function

It is used to concatenate data from multiple rows into one field. It returns a string with a concatenated non-NULL value from a group. It returns NULL when there are no non-NULL values.

Let’s take an example of the “Book master” table as below:

Query to return a list of comma(,) separated ‘cate_id’s for each group of ‘pub_id’.

SELECT pub_id,GROUP_CONCAT(cate_id)
FROM book_mast
GROUP BY pub_id;

Integrate SQL Query with Python

Panda is a wonderful library that gives one line code to query from SQL. Below code to query from Pandas:

query = "SELECT * FROM CURRENT_TABLE"
sql_data = pandas.read_sql(query, connection)

Conclusion

Data scientists or analysts should understand SQL. In fact, all professionals working with data and analytics should know SQL. SQL is still the most powerful tool to help you investigate, filter, and aggregate to get a thorough understanding of your data.

SQL for Data Scientists/Analysts was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

SQL for Data Scientists/Analysts

Author(s): Saniya Parveez

Introduction

SQL Tricks for Every Day

COALESCE() Function

ROW_NUMBER() function

WITH Statement

GROUP_CONCAT() function

Integrate SQL Query with Python

Conclusion

Towards AI Team

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

TAI #148: New API Models from OpenAI (4.1) & xAI (grok-3); Exploring Deep Research’s Scaling Laws

Traditional RAG vs Graph RAG

I Was About to Order Taco Bell Again. Instead, I Built an AI That Talks Me Down

MCP is on Fire.

Efficient Fine-Tuning of LLMs: LoRA and QLoRA in Enterprise AI LangGraph Workflows

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

SQL for Data Scientists/Analysts

Author(s): Saniya Parveez

Introduction

SQL Tricks for Every Day

COALESCE() Function

ROW_NUMBER() function

WITH Statement

GROUP_CONCAT() function

Integrate SQL Query with Python

Conclusion

Towards AI Team

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥