The Danger of High (or Small) Numbers In Your Computer And ML Models
Last Updated on October 7, 2025 by Editorial Team
Author(s): Nelson Cruz
Originally published on Towards AI.
During day-to-day programming or general computer use, it’s common to overlook how the computer handles numbers in its definition. But this easily becomes a problem when we try to optimize a solution and even in unavoidable situations.
What really is the danger
Computers represent numbers using bits, their most basic binary unit. Each memory has a bit storage capacity defined by current technology, so if we have a computer that operates with 3 bits, we have the following situation:

When we add 1 when we’re at the maximum number of bits, we encounter a problem since we would need 1 more bit to represent that number. In computing, this is called an integer overflow. When a sum results in integers that are too large for the chosen type, the result “turns inside out."
For example, in Python:
import numpy as np
# Define the largest 32-bit signed integer
x = np.int32(2147483647)
print("Before overflow:", x)
# Add 1 -> causes overflow
x = x + np.int32(1)
print("After overflow:", x)
Output:
Before overflow: 2147483647
After overflow: -2147483648
This behavior isn’t a bug, but rather a consequence of the limits of binary representation. In several famous examples, this occurs with real-world problems.
Unexpected famous cases
The Boeing 787 Case (2015)
In 2015, Boeing discovered that the Boeing 787 Dreamliner’s generators could shut down mid-flight if they were left on for 248 consecutive days without being restarted.
The reason? An internal timer, based on 32-bit integers, would overflow after this period, leading to a failure in the aircraft’s power management.
The fix was simple: Periodically restart the system to reset the count and memory to zero, but the potential impact was enormous.
The Level 256 Bug in Pac-Man
Those who played Pac-Man in the arcades may be familiar with the “Kill Screen.” After level 255, the level counter (stored in 8 bits) overflows upon reaching 256. This creates a glitched screen, with half the maze unreadable, making the game impossible to complete.
The developers didn’t expect anyone to play 256 levels of Pac-Man, so they didn’t handle this exception!

The bug of 2038
In the past, just before the year 2000, the bug of the millennium event was very popular: it said that many computers could have a bug due to the change of year 31/12/99 to 01/01/00 after midnight. Gladly, everything turned out fine, but now another catastrophic event looms just like a new Maya prophecy.
Many Unix and C systems use a signed 32-bit integer to count seconds since January 1, 1970 (the famous Unix timestamp). This counter will reach its limit on January 19, 2038, overflowing around 2.147.483.647 seconds. If left unfixed, any software that relies on time could exhibit unpredictable behavior.
And these situations doesn’t just happen with integers — with floating point numbers the situation is even more delicate, especially when we talk about numerical precision in different examples such as in Machine Learning.
How Float Variables Work
Floats (floating-point numbers) are used to represent real numbers in computers, but unlike integers, they cannot represent every value exactly. Instead, they store numbers approximately using a sign, exponent, and mantissa (according to the IEEE 754 standard).

And just like the integers of previous examples, the mantissa and exponent are represented by bits that are finite. Its value will depend on the number of bits such as 16, 32 or 64 defined by the variable declaration.
Float16 (16 bits):
- Can represent values roughly from 6.1 × 10⁻⁵ to 6.5 × 10⁴
- Precision of about 3–4 decimal digits
- Uses 2 bytes (16 bits) of memory
Float32 (32 bits):
- Can represent values roughly from 1.4 × 10⁻⁴⁵ to 3.4 × 10³⁸
- Precision of about 7 decimal digits
- Uses 4 bytes of memory
Float64 (64 bits):
- Can represent values roughly from 5 × 10⁻³²⁴ to 1.8 × 10³⁰⁸
- Precision of about 16 decimal digits
- Uses 8 bytes of memory
The trade-offs applied in Machine Learning
The higher precision of float64 uses twice the memory and can because of that are slower than flot32 and float16, but is it necessary to use float64?
Deep learning models can have hundreds of millions of parameters. Using float64 would double the memory consumption. For many ML models, including neural networks, float32 is sufficient and allows faster computation with lower memory usage. Some are even studying the application of float16.
In theory, always using the highest precision type seems safe, but in practice Modern GPUs (RTX, for example) perform poorly on float64, while they are optimized for float32 and in some cases float16. For exemple, float64 are 10–30x slower on GPUs optimized for float32.
A simple benchmark test can be made by multiplying matrix:
import numpy as np
import time
# Matrix size
N = 500
# Matrix with different float bit sizes
A32 = np.random.rand(N, N).astype(np.float32)
B32 = np.random.rand(N, N).astype(np.float32)
A64 = A32.astype(np.float64)
B64 = B32.astype(np.float64)
A16 = A32.astype(np.float16)
B16 = B32.astype(np.float16)
def benchmark(A, B, dtype_name):
start = time.time()
C = A @ B # multiply matrix
end = time.time()
print(f"{dtype_name}: {end - start:.5f} seconds")
benchmark(A16, B16, "float16")
benchmark(A32, B32, "float32")
benchmark(A64, B64, "float64")
Exemple of output (it will depend of computational resources):
float16: 0.01 seconds
float32: 0.02 seconds
float64: 0.15 seconds
That said, an important point is that common problems in Machine Learning model performances, such as gradients, are not solved simply by increasing accuracy, but rather by making good architectural choices.
Some good practices to solve it
In deep networks, gradients can become very small after traversing several layers. In float32, values smaller than ~1e-45 literally become zero.
This means that the weights are no longer updated — the infamous vanishing gradient problem.
But the solution isn’t to migrate to float64. Instead, we have smarter solutions.
ReLU : Unlike sigmoid and tanh, which flatten values and make the gradient disappear, ReLU keeps the derivative equal to 1 for x > 0.
This prevents the gradient from reaching zero too quickly.

Batch Normalization: Normalizes the activations in each batch to keep means close to 0 and variances close to 1. This way, the values remain within the safe range of float32 representation.
Residual Connections (ResNet): They create “shortcuts” through a specific function so the gradient can span multiple layers without disappearing. They allow networks with 100+ layers to work well in float32.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.