Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

Performance Optimization in NumPy (Speed Matters!)
Data Analysis   Latest   Machine Learning

Performance Optimization in NumPy (Speed Matters!)

Author(s): NIBEDITA (NS)

Originally published on Towards AI.

Hey guys! Welcome back to our NumPy for DS & DA series. This is going to be the 9th article of this series. In our previous article, we discussed Generating Random Numbers with NumPy. We also saw some code examples to understand the concepts better. So, if you haven’t read the previous article yet, you can check that out first.

So, let’s get into our today’s topic now.

Performance Optimization in NumPy (Speed Matters!)
Performance Optimization — Speed Matters

When we’re working with large datasets, even a small inefficiency can slow everything down. The good news? NumPy is built for speed, and with a few smart habits, we can squeeze even more performance out of it.

Let’s explore practical ways to make NumPy operations faster. I’ll keep it simple with examples you can try right away.

1. Use Vectorized Operations Instead of Python Loops

The golden rule: avoid Python’s for loops whenever possible.

Alright! Let’s understand this with a practical example.

import numpy as np
from time import time

We’re taking the help of time module to compare their speed.

arr = np.arange(1_000_000)

loop_time1 = time()
result = []
for x in arr:
result.append(x * 2)

loop_time2 = time()
print(f"Loop Time: {loop_time2 - loop_time1} seconds")
# Loop time: 0.12406730651855469 seconds

numpy_time1 = time()
result = arr * 2
numpy_time2 = time()
print(f"Numpy Time: {numpy_time2 - numpy_time1} seconds")
# Numpy time: 0.01680159568786621 seconds

Now, you can see the difference. How speed NumPy is compared to regular Python Loops.

Why?

NumPy does the Math in compiled C code under the hood. One line of vectorized code can be 10–100x faster than a Python loop.

2. Choose the Right Data Type

Smaller and Appropriate Data Types, means Less Memory. And Less Memory means Faster Operations.

arr = np.arange(1_000_000, dtype=np.int32)

This uses less memory than int64. If you don’t need floating-point precision, don’t use float64 by default. For huge arrays, this small change can save hundreds of MBs.

3. Pre-allocate Arrays Instead of Growing Them

Repeatedly appending to a list or array forces NumPy to keep creating new blocks of memory. Let’s compare three different ways:

Block 1: Range x For Loop

data = []
for i in range(1_000_000):
data.append(i)

arr = np.array(data)

Blcok 2: NumPy Array x For Loop

arr = np.empty(1_000_000, dtype=np.int32)
for i in range(1_000_000):
arr[i] = i

Block 3: No Loop At All

Better yet, if you know the pattern, skip the loop entirely:

arr = np.arange(1_000_000, dtype=np.int32)

This is the fastest and most efficient way to create a NumPy array of sequential integers, both in terms of speed and memory efficiency.

  • Block 1 is the slowest because appending to a Python list is not optimized for numeric operations and the subsequent conversion to NumPy creates an additional copy, making it memory-inefficient.
  • Block 2 eliminates the list but still uses a Python-level for-loop, which can’t compete with the compiled efficiency of internal NumPy functions.
  • Block 3 leverages NumPy’s internal implementation, resulting in highly optimized, compiled code for memory allocation and assignment, making it significantly faster and more memory-efficient than the other two.

4. Use In-Place Operations

In-place operations modify the array directly, saving both time and memory.

arr = np.arange(10**7)
arr *= 2

This is highly optimized because it uses NumPy’s vectorized operations to efficiently create and modify large arrays in memory without Python-level loops.

What are some Key Optimization Points of In-Place Operations?

  • Vectorization: Operations apply to the entire array at once, taking full advantage of low-level optimizations and CPU features.
  • Memory Efficiency: No intermediate Python lists or per-element assignments are involved, so memory allocation and reuse are optimal.
  • Speed: The combination of contiguous data storage and vectorized instructions leads to performance many times faster than equivalent pure-Python loops.

Look for operations ending with an underscore like arr.sort() vs np.sort(arr) or use operators (+=, *=) to update arrays in place.

5. Work with Views, Not Copies

Slicing creates a view instead of copying, which is much faster.

A View is a new array object that shares the same data buffer as the original array but may have different metadata (like shape or strides). While a Copy creates a completely new array with its own independent data.

Changes to the View affect the Original array, and changes to the Original affect the View because they reference the same underlying data. But modifying a Copy has no effect on the Original array, and vice versa.

Views are typically created by slicing or reshaping operations. While, Copies require extra memory allocation and take more time because the data must be duplicated.

large_arr = np.arange(10**7)
view = large_arr[100:200]

Views are faster and more memory-efficient because no new data is allocated or copied. Only a new window onto the existing data is created, saving both time and memory.

But as we know modifying View changes the Original array, we have to be careful with Views as well. Else, unintended changes to the original data can occur if you modify the View. 😬

7. Leverage Broadcasting

Broadcasting lets NumPy perform operations across arrays of different shapes without loops or extra memory.

matrix = np.ones((3, 3))
vector = np.array([1, 2, 3])

result = matrix + vector # Vector is “broadcast” across rows

Let’s take another example:

a = np.array([1, 2, 3]) # shape (3,)
b = np.array([[10], [20], [30]]) # shape (3,1)

result = a + b # shape (3,3) broadcasted addition

The smaller array is virtually stretched to the larger array’s shape by reinterpreting strides and metadata, without duplicating the data in memory. This avoids the need to allocate a large array filled with repeated data, which actually saves memory.

If you wanna know more about Broadcasting, then you can also check out my Broadcasting and Vectorized Operations in NumPy. Or check out the entire List of NumPy for DS & DA.

8. Profile Before You Optimize

Not sure where the slowdown is? Use simple profiling.

Profiling tools measure how long different parts of our code take to run, how much memory they use, and where most of the execution time is spent.

%timeit arr * 2

%timeit is an IPython magic command built on Python’s timeit module. When running %timeit arr * 2, it repeatedly executes the expression arr * 2 many times (usually thousands) and records execution times.

%timeit automatically handles setup and multiple runs to provide statistically significant timing results without manual intervention.

arr = np.arange(int(1e6))
%timeit arr * 2
# 3.99 ms ± 741 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  • 3.99 ms is the average time taken to execute the code arr * 2 once.
  • This average is calculated over multiple runs, where each run executes the statement many times. In our case, it’s 100 loops per run.
  • ± 741 μs (microseconds) is the standard deviation, showing how much the execution time varied between runs.
  • The measurement is based on 7 separate runs of the timing experiment.
  • Each run performs the operation 100 times (100 loops each) to gather sufficient data for accuracy.

%timeit basically ensures reliable timing by running the code repeatedly and reporting the mean and variability over multiple runs, giving a robust estimate of typical performance.

In Jupyter or IPython, you can use this to measure execution time. Focus on the slowest parts, don’t guess.😂

9. Go Even Faster with Numba or Cython

If you absolutely need more speed, tools like Numba can compile Python code to machine code.

from numba import njit

@njit
def double(arr):
return arr * 2

Numba is a just-in-time (JIT) compiler for Python that can significantly accelerate numerical computations, especially those involving loops over NumPy arrays. Well, I have not personally worked extensively with Numba yet. Maybe someday, if I ever find a time machine!😂

That said, Numba is widely recognized for its ability to compile Python code into highly optimized machine code, often resulting in performance improvements beyond standard NumPy operations.

But start with NumPy best practices first, often they’re all you need.

Key Takeaways

  • Vectorize everything you can, ditch Python loops.😛
  • Pick the smallest suitable data type.
  • Pre-allocate arrays instead of appending.
  • Use in-place operations and views to save memory.
  • Trust NumPy’s built-in functions, they’re fast for a reason.😌

Speed Matters, especially with big data. A few simple habits can turn slow code into lightning-fast Analytics.

NumPy already gives you speed. But with these tricks, you’ll squeeze out every last drop of performance, and spend more time analyzing data, less time waiting for code to run.

And That’s all for today!

Check out related Lists:

Thanks for reading! 😊

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.