What Models Prefer to Learn: A Geometric Framing of Architecture and Regularization

Last Updated on May 9, 2025 by Editorial Team

Author(s): Sigurd Roll Solberg

Originally published on Towards AI.

What Models Prefer to Learn: A Geometric Framing of Architecture and Regularization — An adventure through unknown landscapes. By Grok.

Intro

What does a Neural Network really learn?

Every machine learning model, deep or shallow, learns by searching within a “hypothesis space” — the set of functions it can, in principle, represent. But this space is not neutral territory. It is carved out and weighted by two forces: architecture and regularization.

The architecture defines what can be expressed.
Regularization defines how likely different regions of this space are to be explored or trusted.

This isn’t a new observation. But as models grow more expressive and application-specific, understanding how these two elements interact becomes not just academic — but foundational to intelligent model design.

Our goal in this post is to take this question seriously. We’ll explore how different neural architectures sculpt the geometry and topology of hypothesis spaces, and how regularization can be viewed not simply as a constraint but as a prioritization scheme — a way of emphasizing certain “regions” of the hypothesis space over others. By reframing the problem geometrically, we aim to build intuition for what models prefer to learn, and why.

Exploration

1. A Tale of Two Learners

Imagine two neural networks trained on the same data. One is a shallow MLP; the other is a convolutional neural network. Both converge to low training error. Yet their generalization behavior differs dramatically.

Why?

Because even though both underlying architectures are “universal approximators,” the shape of their hypothesis spaces is different. The MLP has no built-in notion of locality or translation invariance. It must learn such inductive biases from scratch. The CNN, by contrast, starts with a geometry: spatial locality is baked in.

This difference reflects not just a shift in what functions are representable, but in how easy it is for the optimizer to find and prefer certain solutions. The architecture defines not just a boundary around the space, but a gradient-weighted landscape over it.

2. From Functions to Manifolds

To make this precise, think of the hypothesis space as a manifold embedded in a larger function space. An architecture carves out a submanifold of functions it can express. But this isn’t a flat, uniform surface. It has:

Curvature: Some functions are easier to reach (lower curvature), others harder (steep gradients, complex compositions).
Volume: Some function classes occupy more “space” — e.g., shallow networks more easily model linear or low-frequency functions.
Topology: Some architectures enforce continuity or symmetries that others do not.

This brings us to a geometric deep learning lens: architectural priors shape the metric and topology of the hypothesis space [2]. CNNs favor translationally equivariant functions. GNNs favor permutation invariance. Transformers? Attention-weighted global interactions.

The optimizer doesn’t explore all of function space — it flows along this curved, structured manifold defined by the architecture.

3. Regularization as a Measure over Hypothesis Space

Now enter regularization. In its classic form (e.g., L2 norm), it’s often interpreted as penalizing complexity. But this view is limited. More deeply, regularization defines a measure over the hypothesis space — a way of saying: “These functions are more likely. These ones are suspect.”

Dropout, for example, flattens reliance on specific units, favoring more distributed representations. Spectral norm regularization constrains Lipschitz continuity, biasing toward smoother functions. Bayesian neural networks make this idea explicit: the prior over weights induces a prior over functions.

Viewed this way, regularization isn’t a constraint on learning — it’s a shaping force. It sculpts the energy landscape. It changes which valleys the optimizer is most likely to settle into.

This becomes especially interesting when we realize that different regularizers and architectures may interact nonlinearly. A regularizer that improves generalization in one architecture may hurt it in another, simply because the underlying hypothesis space is differently curved or composed.

Resolution

A Geometric Framing of Learning Bias

Let’s sharpen the central claim:

Learning is a process of moving along a structured manifold, defined by the architecture, following a flow field shaped by regularization, in pursuit of a low-energy state defined by the loss function.

In this framing:

Architecture defines the manifold of functions the model can express — the terrain on which learning happens.
Regularization imposes a density or potential field over this terrain — some directions become easier, some harder.
The loss function defines the energy landscape — it tells us where the valleys lie, where the model should settle.

A 3D curved manifold representing a neural network’s hypothesis space, with gradients guided by regularization descending toward a low-loss region. Generated by chatGPT.

The optimization algorithm — usually gradient descent — acts as a navigator. But it doesn’t traverse all of function space. It flows along this manifold, biased by regularization, toward regions of low loss.

This perspective reframes generalization not as mere convergence, but as a bias-aware descent on a curved manifold, where both geometry and preference shape the final outcome.

Conclusion

Designing With Geometry in Mind

If we accept that architecture and regularization jointly shape the hypothesis space, then several strategic insights follow:

Architectural choices should be guided not just by empirical performance but by understanding what kind of manifold they induce. Geometry matters.
Regularization strategies should be tuned to the architecture — not just in hyperparameter terms, but in philosophical terms: what kind of functions are we favoring?
Future research might benefit from explicit characterizations of these manifolds: can we map the implicit bias of different models, or even interpolate between hypothesis spaces?

Perhaps most provocatively: we may want to design architectures and regularizers in tandem, as complementary instruments in sculpting the model’s functional landscape.

This is not a call to abandon empirical methods. But it is a call to infuse them with geometric and probabilistic awareness. To think not just in terms of performance, but of preference — what our models are predisposed to learn, and why.

If geometric deep learning taught us that data lives on a manifold, then perhaps the next lesson is this: so do our models.

References

[1] Poggio et al., “Theory of Deep Learning III: Explaining the Non-overfitting Puzzle”
[2] Bronstein et al., “Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges”

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

What Models Prefer to Learn: A Geometric Framing of Architecture and Regularization

Author(s): Sigurd Roll Solberg

Intro

Exploration

1. A Tale of Two Learners

2. From Functions to Manifolds

3. Regularization as a Measure over Hypothesis Space

Resolution

Conclusion

References

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

What Models Prefer to Learn: A Geometric Framing of Architecture and Regularization

Author(s): Sigurd Roll Solberg

Intro

Exploration

1. A Tale of Two Learners

2. From Functions to Manifolds

3. Regularization as a Measure over Hypothesis Space

Resolution

Conclusion

References

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement