Eigen Vectors & Spectral Decomposition
Last Updated on May 27, 2026 by Editorial Team
Author(s): Taru Vaid
Originally published on Towards AI.
Eigen Vectors & Spectral Decomposition
The core idea of spectral decomposition is to break a matrix into a set of simpler, independent pieces — each piece being a direction and a strength.
Every piece says: “in this direction, the matrix acts like simple multiplication by this number.” The full matrix is just the sum of all these pieces stacked together.
The word “spectral” comes from the spectrum of a matrix — the set of its eigenvalues, just as white light has a spectrum of frequencies. Decomposing a matrix spectrally is like passing light through a prism — you separate what was mixed into its pure components.
Eigen vectors are a core idea in spectral decomposition.

Most vectors, when you apply a matrix to them, change direction. They rotate, they tilt, they end up pointing somewhere new.
Eigenvectors are the special vectors that refuse to change direction. When the matrix acts on them, they only stretch or shrink — they stay pointing exactly the same way (or exactly opposite, if the eigenvalue is negative).
Hence, eigenvectors are completely tied to the specific transformation the matrix represents. They are the skeleton of that transformation. Different matrix = different skeleton.
There are many types of spectral decompositions.
Eigen decomposition for square matrices that have enough independent eigenvectors. Writes A = PDP⁻¹ where D is diagonal (eigenvalues) and P holds the eigenvectors as columns. Requires the matrix to be diagonalizable — not all square matrices qualify.

Spectral theorem decomposition the clean special case of eigen decomposition when the matrix is symmetric (or Hermitian for complex matrices). Guarantees real eigenvalues, orthogonal eigenvectors, and writes A = QDQᵀ. This is the version that appears in covariance matrices, the NTK, and the kernel matrices. The problem with this however is that its limited by the need for a symmetric matrix. Singular value decomposition comes to the rescue and generalizes this concept for non-symmetric matrices.
Singular Value Decomposition (SVD) Any non-symmetric matrix when multiplied with its transpose gives us a symmetric matrix.

In mathematics, the singular values are the square roots of the eigenvalues. Eigen values of matrix formed by multiplying A with its transpose are the same, regardless of the order of multiplication. The eigen vectors, however, are different and will depend on the order of multiplication.
SVD in its most general form — works for any matrix, including rectangular ones. Writes A = UΣVᵀ where U and V are orthogonal and Σ is diagonal with non-negative entries (singular values). When A is symmetric positive definite, SVD and the spectral theorem give the same result.

SVD and the Fourier transform are two key concepts and are closely related. They both decompose a signal or matrix into a sum of simpler, structured components that reveal hidden geometry.
In SVD, a matrix A is factored such that any linear transformation as “rotate → scale → rotate,” where the basis vectors are learned from the data itself. In the Fourier transform, a function is decomposed into fixed orthogonal sinusoidal basis functions (sines and cosines), each representing a frequency component with a coefficient telling you how much of that frequency is present.
Hence both methods are essentially change-of-basis tools: SVD finds an optimal data-dependent basis that diagonalizes a matrix, while Fourier uses a fixed global basis that diagonalizes convolution/shift-invariant systems. In both cases, complex structure becomes a sum of independent modes, and energy concentrates in a few dominant components, making compression, denoising, and analysis much easier.
Schur decomposition for any square matrix, even ones that can’t be diagonalized. Writes A = QTQ* where T is upper triangular and Q is unitary. Less interpretable than SVD but always exists — useful as a theoretical tool.
Jordan normal form the fallback when a matrix genuinely can’t be diagonalized — repeated eigenvalues with insufficient eigenvectors. Writes A in a near-diagonal block form called Jordan blocks. Rarely used in ML but important in understanding why some initializations cause gradient pathologies.
Polar decomposition Writes any matrix as A = QS where Q is orthogonal (rotation/reflection) and S is symmetric positive semidefinite (pure stretching). Closely related to SVD — S is essentially VΣVᵀ. Useful for understanding the geometry of a transformation: the rotation part and the stretching part separated cleanly.
Spectral decomposition is used widely in machine learning and deep learning.
Eigen decomposition PCA (Principal Component Analysis) is pure eigen decomposition of the covariance matrix. Every dimensionality reduction, every time you project high-dimensional data to a lower-dimensional space before feeding it to a model — that’s eigen decomposition at work. Also appears in graph neural networks via the graph Laplacian.
Spectral theorem decomposition The NTK is a symmetric matrix. Its eigen decomposition via the spectral theorem is exactly what determines which directions in function space get learned fast and which get learned slow. The spectral bias we discussed — networks learning low-frequency functions first — is literally a statement about the eigenvalue spectrum of the NTK. Large eigenvalues → fast learning. Small eigenvalues → slow learning.
SVD Shows up everywhere:
- Low-rank adaptation (LoRA) — the dominant technique for fine-tuning large language models — works by decomposing weight update matrices using SVD and keeping only the top singular values. This is how LLMs are fine-tuned cheaply
- Attention matrices in transformers have an implicit low-rank SVD structure that researcher's study to understand what the model is attending to
- Recommendation systems (Netflix, Spotify) use matrix factorization which is essentially truncated SVD
Schur decomposition Less directly used in practice but appears in the theoretical analysis of recurrent neural networks and LSTMs — understanding why gradients vanish or explode over long sequences involves the Schur form of the recurrent weight matrix.
Jordan normal form This is the theoretical explanation for the vanishing gradient problem. When a recurrent network’s weight matrix has repeated eigenvalues less than 1, the Jordan blocks explain why gradients decay exponentially with sequence length. It’s the math behind why vanilla RNNs fail on long sequences and why LSTMs were designed the way they were.
Polar decomposition Used in weight initialization research — orthogonal initialization (which Saxe et al. recommend) essentially extracts the Q part of the polar decomposition of a random matrix. It’s also used in some optimization algorithms that try to keep weight matrices near-orthogonal during training to preserve gradient flow.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI Academy
We Build Enterprise-Grade AI. We'll Teach You to Master It Too.
15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.
Start free — no commitment:
→ 6-Day Agentic AI Engineering Email Guide — one practical lesson per day
→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages
Our courses:
→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.
→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.
→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.
Note: Article content contains the views of the contributing authors and not Towards AI.