Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-FranΓ§ois Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

What Models Prefer to Learn: A Geometric Framing of Architecture and Regularization
Latest   Machine Learning

What Models Prefer to Learn: A Geometric Framing of Architecture and Regularization

Last Updated on May 9, 2025 by Editorial Team

Author(s): Sigurd Roll Solberg

Originally published on Towards AI.

An adventure through unknown landscapes. By Grok.

Intro

What does a Neural Network really learn?

Every machine learning model, deep or shallow, learns by searching within a β€œhypothesis space” β€” the set of functions it can, in principle, represent. But this space is not neutral territory. It is carved out and weighted by two forces: architecture and regularization.

  • The architecture defines what can be expressed.
  • Regularization defines how likely different regions of this space are to be explored or trusted.

This isn’t a new observation. But as models grow more expressive and application-specific, understanding how these two elements interact becomes not just academic β€” but foundational to intelligent model design.

Our goal in this post is to take this question seriously. We’ll explore how different neural architectures sculpt the geometry and topology of hypothesis spaces, and how regularization can be viewed not simply as a constraint but as a prioritization scheme β€” a way of emphasizing certain β€œregions” of the hypothesis space over others. By reframing the problem geometrically, we aim to build intuition for what models prefer to learn, and why.

Exploration

1. A Tale of Two Learners

Imagine two neural networks trained on the same data. One is a shallow MLP; the other is a convolutional neural network. Both converge to low training error. Yet their generalization behavior differs dramatically.

Why?

Because even though both underlying architectures are β€œuniversal approximators,” the shape of their hypothesis spaces is different. The MLP has no built-in notion of locality or translation invariance. It must learn such inductive biases from scratch. The CNN, by contrast, starts with a geometry: spatial locality is baked in.

This difference reflects not just a shift in what functions are representable, but in how easy it is for the optimizer to find and prefer certain solutions. The architecture defines not just a boundary around the space, but a gradient-weighted landscape over it.

2. From Functions to Manifolds

To make this precise, think of the hypothesis space as a manifold embedded in a larger function space. An architecture carves out a submanifold of functions it can express. But this isn’t a flat, uniform surface. It has:

  • Curvature: Some functions are easier to reach (lower curvature), others harder (steep gradients, complex compositions).
  • Volume: Some function classes occupy more β€œspace” β€” e.g., shallow networks more easily model linear or low-frequency functions.
  • Topology: Some architectures enforce continuity or symmetries that others do not.

This brings us to a geometric deep learning lens: architectural priors shape the metric and topology of the hypothesis space [2]. CNNs favor translationally equivariant functions. GNNs favor permutation invariance. Transformers? Attention-weighted global interactions.

The optimizer doesn’t explore all of function space β€” it flows along this curved, structured manifold defined by the architecture.

3. Regularization as a Measure over Hypothesis Space

Now enter regularization. In its classic form (e.g., L2 norm), it’s often interpreted as penalizing complexity. But this view is limited. More deeply, regularization defines a measure over the hypothesis space β€” a way of saying: β€œThese functions are more likely. These ones are suspect.”

Dropout, for example, flattens reliance on specific units, favoring more distributed representations. Spectral norm regularization constrains Lipschitz continuity, biasing toward smoother functions. Bayesian neural networks make this idea explicit: the prior over weights induces a prior over functions.

Viewed this way, regularization isn’t a constraint on learning β€” it’s a shaping force. It sculpts the energy landscape. It changes which valleys the optimizer is most likely to settle into.

This becomes especially interesting when we realize that different regularizers and architectures may interact nonlinearly. A regularizer that improves generalization in one architecture may hurt it in another, simply because the underlying hypothesis space is differently curved or composed.

Resolution

A Geometric Framing of Learning Bias

Let’s sharpen the central claim:

Learning is a process of moving along a structured manifold, defined by the architecture, following a flow field shaped by regularization, in pursuit of a low-energy state defined by the loss function.

In this framing:

  • Architecture defines the manifold of functions the model can express β€” the terrain on which learning happens.
  • Regularization imposes a density or potential field over this terrain β€” some directions become easier, some harder.
  • The loss function defines the energy landscape β€” it tells us where the valleys lie, where the model should settle.
A 3D curved manifold representing a neural network’s hypothesis space, with gradients guided by regularization descending toward a low-loss region. Generated by chatGPT.

The optimization algorithm β€” usually gradient descent β€” acts as a navigator. But it doesn’t traverse all of function space. It flows along this manifold, biased by regularization, toward regions of low loss.

This perspective reframes generalization not as mere convergence, but as a bias-aware descent on a curved manifold, where both geometry and preference shape the final outcome.

Conclusion

Designing With Geometry in Mind

If we accept that architecture and regularization jointly shape the hypothesis space, then several strategic insights follow:

  • Architectural choices should be guided not just by empirical performance but by understanding what kind of manifold they induce. Geometry matters.
  • Regularization strategies should be tuned to the architecture β€” not just in hyperparameter terms, but in philosophical terms: what kind of functions are we favoring?
  • Future research might benefit from explicit characterizations of these manifolds: can we map the implicit bias of different models, or even interpolate between hypothesis spaces?

Perhaps most provocatively: we may want to design architectures and regularizers in tandem, as complementary instruments in sculpting the model’s functional landscape.

This is not a call to abandon empirical methods. But it is a call to infuse them with geometric and probabilistic awareness. To think not just in terms of performance, but of preference β€” what our models are predisposed to learn, and why.

If geometric deep learning taught us that data lives on a manifold, then perhaps the next lesson is this: so do our models.

References

  • [1] Poggio et al., β€œTheory of Deep Learning III: Explaining the Non-overfitting Puzzle”
  • [2] Bronstein et al., β€œGeometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges”

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓