Why You Should Always Start With a Baseline Model

Last Updated on March 7, 2024 by Editorial Team

Author(s): Jonte Dancker

Originally published on Towards AI.

Why You Should Always Start With a Baseline Model

When we start a new ML project, we usually want to do the interesting stuff as soon as possible. Test the latest model we have read about. Test the state-of-the-art model that promises the best results. Do something complex with which we can show off in front of our colleagues and friends. Do the fun and challenging stuff.

It is easy to get carried away. We spend hours getting a model to run and finetune its hyperparameters.

But how do we know the model performance is good? How do we know if the complexity of the model is reasonable? We cannot tell! There is nothing we can compare our results against. Nothing that gives us a reference point.

That’s why we need a baseline model.

A baseline model should be simple, fast to build, and explainable.

Often, a baseline model takes 10 % of the time to develop but gets us 90 % of the way to achieve reasonable results.

A strong baseline model is a precious part of our model development and evaluation process. The baseline model gives us context, supports our decision-making, allows us to iterate faster, and can expose us to production data early.

Hence, always build a baseline model first.

A baseline model gives us context

Without context, our evaluation metrics do not tell us much. Is a RMSE of 100 bad or is an accuracy of 80 % good? There is no perspective and meaning of our metrics without a benchmark. This benchmark is our baseline model. It helps us interpret the evaluation metrics. Also, we can better understand the impact of our evaluation metrics on our business metrics.

A baseline model supports decision-making

Due to the context, we can better select models and use our resources. We can decide if it is worth continuing with a model or if the complexity of a model is worth its added value. Based on the baseline, we can also decide if it is worth spending more time on model development.

A baseline model lets us iterate faster

With a baseline model, we get feedback on models, data, and the problem from the beginning. We can focus on things that add value. We do not get tangled up in unnecessary model complexity.

Early feedback on models can guide our model decisions. We can identify if a model can add value and is worth our time. We can also identify modifications that improve model performance.

Early feedback on our data is crucial. We need to understand if the data contains predictive power or is insufficient. And this as early as possible. Otherwise, we spend days or weeks training a complex model on bad data, wasting our time. With the baseline, we can get a feeling for the data. Which target values are difficult to predict? Which target values are similar and let the model struggle? Which features are important? All this information we can then use in more complex models.

Early feedback on the problem helps us to understand its complexity. We can check our assumptions and hypotheses, which is crucial for developing a good model.

A baseline model can expose us to production data early

A simple baseline model is easy and fast to deploy as it doesn’t need much engineering. Having a model in production as fast as possible is valuable. We get exposed to production data early on in the process. We can identify defects in the data and test our assumptions, saving us a lot of time and resources. Little is worse than spending time developing a model only to realize that the training data was not reflecting production data or the assumptions were wrong.

Due to the baseline model’s simplicity, we can fix bugs in our production pipeline. When we want to deploy more complex models, we already have a proven inference pipeline. With this, we can reduce the complexity and increase the speed of deploying complex models.

But what is a good baseline model?

Usually, rule-based or common sense models do the job as they compare well to complex ML models. These models use simple rules, are fast to build, and don’t need much data during inference. Hence, they are easy to interpret and explainable. In a classification problem, we could make a random prediction or always predict the majority class. In a regression problem, we could predict the mean or median. Or, if we have seasonal data, we can use a seasonal model.

Yet, we need to be careful when choosing our baseline model. If we pick a weak baseline model, our results can be misleading.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

Why You Should Always Start With a Baseline Model

Author(s): Jonte Dancker

A baseline model gives us context

A baseline model supports decision-making

A baseline model lets us iterate faster

A baseline model can expose us to production data early

But what is a good baseline model?

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Why You Should Always Start With a Baseline Model

Author(s): Jonte Dancker

A baseline model gives us context

A baseline model supports decision-making

A baseline model lets us iterate faster

A baseline model can expose us to production data early

But what is a good baseline model?

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement