Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

Z-Score Standardization & StandardScaler:
Artificial Intelligence   Latest   Machine Learning

Z-Score Standardization & StandardScaler:

Last Updated on October 15, 2025 by Editorial Team

Author(s): Amna Sabahat

Originally published on Towards AI.

You’ve cleaned your data, handled missing values, and are ready to build a powerful machine learning model. But there’s one critical step left: feature scaling. If you’ve ever wondered why your K-Nearest Neighbors model performs poorly or your Neural Network takes forever to train, unscaled data is likely the culprit.

In this comprehensive guide, we’ll dive deep into Z-Score Standardization — one of the most effective scaling techniques — and its practical implementation using StandardScaler.

Before we dive into Z-Score, let’s understand the two fundamental concepts that make it work

Z-Score Standardization & StandardScaler:

What is the Mean?

The mean (often called the “average”) is the most common measure of central tendency. It represents the typical value in your dataset.

Formula:
μ = (Σx) / N

Where:

  • μ (mu) = Mean
  • Σx = Sum of all values in the dataset
  • N = Total number of values

Example:
Let’s calculate the mean of this dataset: [10, 20, 30, 40, 50]

μ = (10 + 20 + 30 + 40 + 50) / 5 = 150 / 5 = 30

So, the mean is 30. This tells us the “center” of our data is around 30.

What is Standard Deviation?

The standard deviation measures how spread out your data is from the mean. It tells you how much variation or dispersion exists in your dataset.

Formula:
σ = √[Σ(x - μ)² / (N-1)]

Where:

  • σ (sigma) = Standard Deviation
  • x = Each individual value
  • μ = Mean of the dataset
  • N = Total number of values

Let’s break this down step by step:

Step-by-Step Calculation for [10, 20, 30, 40, 50]:

  1. Calculate mean: μ = 30 (as shown above)
  2. Find differences from mean:
  • 10 - 30 = -20
  • 20 - 30 = -10
  • 30 - 30 = 0
  • 40 - 30 = 10
  • 50 - 30 = 20

3. Square the differences:

  • (-20)² = 400
  • (-10)² = 100
  • (0)² = 0
  • (10)² = 100
  • (20)² = 400

4. Sum the squared differences: 400 + 100 + 0 + 100 + 400 = 1000

5. Divide by number of values: 1000 / 4 = 200

6. Take square root: √250 ≈ 15.81

So, the standard deviation is ≈15.81

What does this mean?

  • A low standard deviation means data points are close to the mean
  • A high standard deviation means data points are spread out over a wider range
  • In our example, most values are within ±15.81 units from the mean of 30

What is Z-Score Standardization?

The Concept

Now that we understand mean and standard deviation, Z-Score Standardization becomes much clearer. It’s a statistical method that transforms your data to have a mean of 0 and a standard deviation of 1. It’s like centering your data around zero and making the spread consistent across all features.

The Mathematical Formula

The transformation is beautifully simple:

z = (x - μ) / σ

Where:

  • x = Original value
  • μ (mu) = Mean of the feature
  • σ (sigma) = Standard deviation of the feature
  • z = Standardized value (z-score)

Why Does This Matter?

Let’s break this down with our same example:

Suppose we have a feature with values: [10, 20, 30, 40, 50]

We already calculated:

  • Mean (μ) = 30
  • Standard Deviation (σ) ≈ 15.81

Step 2: Apply Z-Score Formula

  • For value 10: (10 - 30) / 15.81≈ -1.26
  • For value 20: (20 - 30) / 15.81 ≈ -0.63
  • For value 30: (30 - 30) / 15.81 = 0
  • For value 40: (40 - 30) / 15.81 ≈ 0.63
  • For value 50: (50 - 30) / 15.81 ≈ 1.26

Our transformed data becomes: [-1.26, -0.63, 0, 0.63, 1.26]

What just happened?

  • The mean shifted from 30 to 0
  • The spread normalized — each value now represents how many standard deviations it is away from the mean
  • Value -1.26 means it’s 1.26 standard deviations below the mean
  • Value 1.26 means it’s 1.26 standard deviations above the mean

Why Use Z-Score Standardization? The Theory Behind the Magic

1. Algorithms Sensitive to Feature Scales

Z-score standardization is crucial for algorithms that rely on distance calculations or gradient-based optimization:

  • Support Vector Machines (SVM): Uses distance to define margins
  • K-Nearest Neighbors (K-NN): Relies on Euclidean distance
  • Neural Networks: Gradient-based optimization converges faster
  • K-Means Clustering: Distance to centroids matters
  • Principal Component Analysis (PCA): Finds directions of maximum variance

2. When Your Data Contains Mild Outliers

Unlike Min-Max Scaling, Z-Score is less sensitive to outliers because it uses the standard deviation rather than min/max range.

3. When You Need Interpretable Features

After standardization, feature values represent their position relative to the mean. A value of 1.5 means “1.5 standard deviations above the mean.”

4. For Gradient-Based Optimization

Algorithms like Linear Regression, Logistic Regression, and Neural Networks benefit greatly.

When Should You Avoid Z-Score Standardization?

1. When You Require Fixed Range Output

Z-Score doesn’t bound your data to a specific range. Results can be any real number, which might be problematic for some applications.

2. With Significant Outliers

While more robust than Min-Max, Z-Score can still be affected by extreme outliers since mean and standard deviation are influenced by them.

3. When Data is Not Approximately Gaussian

Z-Score works best when your data is roughly normally distributed. For heavily skewed distributions, consider other transformations first.

4. With Sparse Data

Can transform zero values to non-zero, destroying sparsity in datasets.

StandardScaler: The Practical Implementation

Now that we understand the theory, let’s see how to implement Z-Score standardization in practice using scikit-learn’s StandardScaler.

Why Use StandardScaler Instead of Manual Calculation?

While you could implement Z-score manually, StandardScaler provides crucial advantages

  • Prevents Data Leakage: The biggest reason to use StandardScaler
  • Pipeline Integration: Works seamlessly with scikit-learn workflows
  • Efficiency: Handles the entire process automatically
  • Consistency: Reduces human error in calculations

⚠️ This is the most important concept in this article:

Never fit your scaler on the entire dataset!

Why This Matters: Data Leakage

If you fit your scaler on the entire dataset (including test data), you’re “peeking” at the test set during training. This gives you overly optimistic performance estimates and models that fail in production.

# WRONG - data leakage
scaler.fit(all_data) # Includes test data!
train_scaled = scaler.transform(train_data)
test_scaled = scaler.transform(test_data)

# CORRECT - no data leakage
scaler.fit(train_data_only) # Only training data
train_scaled = scaler.transform(train_data)
test_scaled = scaler.transform(test_data) # Same scaler

Conclusion:

Through this comprehensive guide, we’ve seen that Z-Score standardization is a powerful technique, but it’s not a one-size-fits-all solution. Here’s your decision framework:

Use Z-Score Standardization when:

  • Working with distance-based algorithms (SVM, K-NN, K-Means)
  • Using gradient-based optimization (Neural Networks, Linear Models)
  • Your data is approximately normally distributed
  • You need interpretable feature contributions

Consider alternatives when:

  • Data has extreme outliers (use RobustScaler)
  • You need specific output ranges (use MinMaxScaler)
  • Working with tree-based models (often no scaling needed)
  • Dealing with sparse data (use MaxAbsScaler)

Remember the golden rule:

Always fit your scaler on training data only and use the same parameters to transform your test data.

Now you’re ready to scale your way to better models!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.