The Danger of High (or Small) Numbers In Your Computer And ML Models

Last Updated on October 7, 2025 by Editorial Team

Author(s): Nelson Cruz

Originally published on Towards AI.

During day-to-day programming or general computer use, it’s common to overlook how the computer handles numbers in its definition. But this easily becomes a problem when we try to optimize a solution and even in unavoidable situations.

What really is the danger

Computers represent numbers using bits, their most basic binary unit. Each memory has a bit storage capacity defined by current technology, so if we have a computer that operates with 3 bits, we have the following situation:

The Danger of High (or Small) Numbers In Your Computer And ML Models — Source: Image by the author.

When we add 1 when we’re at the maximum number of bits, we encounter a problem since we would need 1 more bit to represent that number. In computing, this is called an integer overflow. When a sum results in integers that are too large for the chosen type, the result “turns inside out."

For example, in Python:

import numpy as np

# Define the largest 32-bit signed integer
x = np.int32(2147483647)
print("Before overflow:", x)

# Add 1 -> causes overflow
x = x + np.int32(1)
print("After overflow:", x)

Output:

Before overflow: 2147483647
After overflow: -2147483648

This behavior isn’t a bug, but rather a consequence of the limits of binary representation. In several famous examples, this occurs with real-world problems.

Unexpected famous cases

The Boeing 787 Case (2015)

In 2015, Boeing discovered that the Boeing 787 Dreamliner’s generators could shut down mid-flight if they were left on for 248 consecutive days without being restarted.

The reason? An internal timer, based on 32-bit integers, would overflow after this period, leading to a failure in the aircraft’s power management.

The fix was simple: Periodically restart the system to reset the count and memory to zero, but the potential impact was enormous.

The Level 256 Bug in Pac-Man

Those who played Pac-Man in the arcades may be familiar with the “Kill Screen.” After level 255, the level counter (stored in 8 bits) overflows upon reaching 256. This creates a glitched screen, with half the maze unreadable, making the game impossible to complete.

The developers didn’t expect anyone to play 256 levels of Pac-Man, so they didn’t handle this exception!

The bug of 2038

In the past, just before the year 2000, the bug of the millennium event was very popular: it said that many computers could have a bug due to the change of year 31/12/99 to 01/01/00 after midnight. Gladly, everything turned out fine, but now another catastrophic event looms just like a new Maya prophecy.

Many Unix and C systems use a signed 32-bit integer to count seconds since January 1, 1970 (the famous Unix timestamp). This counter will reach its limit on January 19, 2038, overflowing around 2.147.483.647 seconds. If left unfixed, any software that relies on time could exhibit unpredictable behavior.

And these situations doesn’t just happen with integers — with floating point numbers the situation is even more delicate, especially when we talk about numerical precision in different examples such as in Machine Learning.

How Float Variables Work

Floats (floating-point numbers) are used to represent real numbers in computers, but unlike integers, they cannot represent every value exactly. Instead, they store numbers approximately using a sign, exponent, and mantissa (according to the IEEE 754 standard).

And just like the integers of previous examples, the mantissa and exponent are represented by bits that are finite. Its value will depend on the number of bits such as 16, 32 or 64 defined by the variable declaration.

Float16 (16 bits):

Can represent values roughly from 6.1 × 10⁻⁵ to 6.5 × 10⁴
Precision of about 3–4 decimal digits
Uses 2 bytes (16 bits) of memory

Float32 (32 bits):

Can represent values roughly from 1.4 × 10⁻⁴⁵ to 3.4 × 10³⁸
Precision of about 7 decimal digits
Uses 4 bytes of memory

Float64 (64 bits):

Can represent values roughly from 5 × 10⁻³²⁴ to 1.8 × 10³⁰⁸
Precision of about 16 decimal digits
Uses 8 bytes of memory

The trade-offs applied in Machine Learning

The higher precision of float64 uses twice the memory and can because of that are slower than flot32 and float16, but is it necessary to use float64?

Deep learning models can have hundreds of millions of parameters. Using float64 would double the memory consumption. For many ML models, including neural networks, float32 is sufficient and allows faster computation with lower memory usage. Some are even studying the application of float16.

In theory, always using the highest precision type seems safe, but in practice Modern GPUs (RTX, for example) perform poorly on float64, while they are optimized for float32 and in some cases float16. For exemple, float64 are 10–30x slower on GPUs optimized for float32.

A simple benchmark test can be made by multiplying matrix:

import numpy as np
import time

# Matrix size
N = 500

# Matrix with different float bit sizes
A32 = np.random.rand(N, N).astype(np.float32)
B32 = np.random.rand(N, N).astype(np.float32)

A64 = A32.astype(np.float64)
B64 = B32.astype(np.float64)

A16 = A32.astype(np.float16)
B16 = B32.astype(np.float16)

def benchmark(A, B, dtype_name):
 start = time.time()
 C = A @ B # multiply matrix
 end = time.time()
 print(f"{dtype_name}: {end - start:.5f} seconds")

benchmark(A16, B16, "float16")
benchmark(A32, B32, "float32")
benchmark(A64, B64, "float64")

Exemple of output (it will depend of computational resources):

float16: 0.01 seconds
float32: 0.02 seconds
float64: 0.15 seconds

That said, an important point is that common problems in Machine Learning model performances, such as gradients, are not solved simply by increasing accuracy, but rather by making good architectural choices.

Some good practices to solve it

In deep networks, gradients can become very small after traversing several layers. In float32, values smaller than ~1e-45 literally become zero.
This means that the weights are no longer updated — the infamous vanishing gradient problem.
But the solution isn’t to migrate to float64. Instead, we have smarter solutions.

ReLU : Unlike sigmoid and tanh, which flatten values and make the gradient disappear, ReLU keeps the derivative equal to 1 for x > 0.
This prevents the gradient from reaching zero too quickly.

Batch Normalization: Normalizes the activations in each batch to keep means close to 0 and variances close to 1. This way, the values remain within the safe range of float32 representation.

Residual Connections (ResNet): They create “shortcuts” through a specific function so the gradient can span multiple layers without disappearing. They allow networks with 100+ layers to work well in float32.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

The Danger of High (or Small) Numbers In Your Computer And ML Models

Author(s): Nelson Cruz

What really is the danger

Unexpected famous cases

How Float Variables Work

The trade-offs applied in Machine Learning

Some good practices to solve it

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

The Danger of High (or Small) Numbers In Your Computer And ML Models

Author(s): Nelson Cruz

What really is the danger

Unexpected famous cases

How Float Variables Work

The trade-offs applied in Machine Learning

Some good practices to solve it

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement