The Sigmoid Function: Foundation of Neural Network

Last Updated on September 29, 2025 by Editorial Team

Author(s): Niraj

Originally published on Towards AI.

Series: Foundation of AI — Blog 1

The Sigmoid Function: Foundation of Neural Network

Every modern neural network stands on mathematical pillars.
One of the most important is the sigmoid activation function.

It’s not just a formula; it’s the bridge between linear math and nonlinear learning.

What is the Sigmoid?

Defined as:
σ(z) = 1 / (1 + e⁻ᶻ)

Takes any real number and compresses it into a value between 0 and 1. Think of it as a soft decision-maker: instead of “True/False”, it says “how likely is True?”.

Why Sigmoid Matters

Before sigmoid, models could only perform linear separation. With sigmoid, neurons could model probabilities and learn complex curves. It gave neural networks their first real ability to handle classification.

The sigmoid’s ability to output values between 0 and 1 makes it ideal for:

Probability estimation — interpreting outputs as likelihoods
Binary classification — distinguishing between two classes
Gradient-based learning — enabling smooth weight updates

Before functions like sigmoid, neural networks could only handle linear separation problems.

Derivative: The Learning Engine

The mathematical elegance lies in how the sigmoid changes during training. Its derivative is simple yet profound:

dσ/dz = σ(z) ⋅ (1 − σ(z))

This compact formula allows gradients to flow backward, enabling backpropagation. Without it, the concept of deep learning would have remained a theory.

How We Derive This

Step 1: Start with the function
σ(z) = (1 + e⁻ᶻ)⁻¹

Step 2: Apply the Chain Rule

dσ/dz = -1 ⋅ (1 + e⁻ᶻ)⁻² ⋅ d/dz(1 + e⁻ᶻ)

Step 3: Differentiate the Inner Function

d/dz(1 + e⁻ᶻ) = -e⁻ᶻ

Step 4: Combine the Results

dσ/dz = -1 ⋅ (1 + e⁻ᶻ)⁻² ⋅ (-e⁻ᶻ) = e⁻ᶻ / (1 + e⁻ᶻ)²

Step 5: Express in Terms of σ(z)

Notice that:

σ(z) = 1 / (1 + e⁻ᶻ)
1 — σ(z) = e⁻ᶻ / (1 + e⁻ᶻ)

Multiplying them gives:
σ(z) ⋅ (1 — σ(z)) = [1 / (1 + e⁻ᶻ)] ⋅ [e⁻ᶻ / (1 + e⁻ᶻ)] = e⁻ᶻ / (1 + e⁻ᶻ)²

Final Result: dσ/dz = σ(z) ⋅ (1 — σ(z))

Why This Matters for Learning

This derivative is computationally efficient because it reuses the neuron’s current output. During backpropagation, it determines how much each weight should change, making neural network training practical and efficient.

The sigmoid function demonstrated that neural networks could learn from data through mathematical optimization, paving the way for modern deep learning.

Next in series: The limitations of sigmoid and the evolution to modern activation functions.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

The Sigmoid Function: Foundation of Neural Network

Author(s): Niraj

Series: Foundation of AI — Blog 1

What is the Sigmoid?

Why Sigmoid Matters

Derivative: The Learning Engine

How We Derive This

Step 2: Apply the Chain Rule

Step 3: Differentiate the Inner Function

Step 4: Combine the Results

Step 5: Express in Terms of σ(z)

Why This Matters for Learning

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Crack ML Interviews with Confidence: K-Nearest Neighbors (KNN 20 Q&A)

The Event-Driven Blueprint: How I Scaled a Spring Boot System to 10 Million Kafka Messages/Day

Building Vector Search? Why FAISS Alone Isn’t Enough

TAI #202: GPT-5.5 Moves Codex Into Real Work

Machine Learning System Design -The Model Serving Triangle, With One Forward Pass Flowing Through Every Trade-off (Part3)

AI Orchestration in Action: How MuleSoft and LLMs Fuel the Future of Enterprise AI

GPT-4 Has 1.8 Trillion Parameters. It Uses 2% of Them Per Token.

Part 20: Data Manipulation in Multi-Dimensional Aggregation

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

The Sigmoid Function: Foundation of Neural Network

Author(s): Niraj

Series: Foundation of AI — Blog 1

What is the Sigmoid?

Why Sigmoid Matters

Derivative: The Learning Engine

How We Derive This

Step 2: Apply the Chain Rule

Step 3: Differentiate the Inner Function

Step 4: Combine the Results

Step 5: Express in Terms of σ(z)

Why This Matters for Learning

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement