Last Updated on April 6, 2023 by Editorial Team
Author(s): Ulrik Thyge Pedersen
Originally published on Towards AI.
Better Together — Four Examples of How Rust Makes Python Better
Leverage Rust to Optimize your Codebase by Boosting Performance and Safety
Python is a popular programming language known for its ease of use, flexibility, and readability. However, its dynamic nature and interpreted execution make it comparatively slower than statically-typed compiled languages like C and Rust. To overcome this limitation, Python developers have been leveraging Rust, a systems programming language, to improve their code’s performance and safety.
Rust offers features like memory safety, thread safety, and zero-cost abstractions, making it an excellent choice for building high-performance libraries. Rust’s ability to interface easily with C and C++ code makes it an ideal candidate for building Python libraries that require high-speed execution and memory management.
In this article, we’ll explore some of the popular Python libraries that are either fully or partially written in Rust, discussing their features, benefits, and how they can help Python developers optimize their codebase. Lets get started!
Polars — Fast DataFrames
Polars is a blazingly fast data manipulation library for Python and Rust that is fully implemented in Rust. It provides a DataFrame API that is similar to the pandas library and can handle large datasets in memory, outperforming pandas in terms of speed and memory usage. Polars is designed to provide a familiar and user-friendly interface while also taking advantage of Rust’s memory safety and performance features.
Here’s an example of how Polars can be used to filter rows from a DataFrame:
In addition to its performance benefits, Polars also provides a number of memory management features that can help prevent memory leaks and improve the overall stability of your Python codebase. By leveraging Rust’s ownership and borrowing system, Polars ensures that memory is managed safely and efficiently without the need for manual memory management by the user.
Overall, Polars is an excellent choice for data-intensive Python applications that require high-speed data manipulation and memory management. Its intuitive API and full compatibility with the pandas library make it easy to integrate into existing codebases and start seeing immediate performance improvements.
TikToken— OpenAI’s Tokenizer
TikToken is a Python library developed by OpenAI that provides fast and efficient tokenization of natural language text. The library is partially implemented in Rust, which helps to improve its performance and memory management.
Here’s an example of how to use the TikToken library to tokenize a piece of text:
TikToken also provides a number of other tokenization models, including WordPiece, BytePairEncoding (BPE), and Unigram. It also allows for custom tokenization models to be defined and trained.
By leveraging Rust’s performance and memory management features, Tokenizers can handle large datasets efficiently, making it an excellent choice for natural language processing (NLP) applications that require fast and accurate tokenization.
In addition to its speed and efficiency, TikToken also provides a number of features for fine-tuning the tokenization process, such as configurable padding and truncation options, token-level and input-level masking, and support for multiple encoding formats.
Overall, TikToken is a powerful library for Python developers working with natural language text, offering both performance and flexibility for a variety of NLP applications.
River — Online Machine Learning
River is a Python library for online machine learning developed by Online-ML. The library is fully implemented in Rust, making it one of the fastest online machine-learning libraries available in Python.
One of the key benefits of River is its ability to train machine learning models on data streams, which are continuous and potentially infinite sources of data. This is a valuable feature for applications such as fraud detection, anomaly detection, and other use cases that require real-time analysis of incoming data.
Here’s an example of how to use River to train a logistic regression model on a stream of data:
By implementing the library in Rust, River is able to take advantage of the performance benefits of Rust’s memory management and low-level optimizations. Rust’s ownership system allows River to optimize memory usage by minimizing the number of copies of data and reducing the amount of time spent on garbage collection. Rust also provides high-level abstractions for low-level operations, such as SIMD instructions, which can improve the performance of machine learning algorithms.
Overall, River is a powerful library for online machine learning in Python, offering both speed and flexibility for a variety of use cases. With its fast performance and support for data streams, River is an excellent choice for developers working on real-time machine learning applications.
HyperJSON — Hyper Fast JSON
HyperJSON is a Python library for encoding and decoding JSON that is implemented in Rust. This library is designed to be faster and more efficient than the built-in
json module in Python.
Here’s an example of how to use HyperJSON to encode and decode JSON:
By implementing the library in Rust, HyperJSON is able to take advantage of Rust’s performance benefits, including its memory safety and efficient handling of memory. This makes it possible for HyperJSON to encode and decode JSON faster than the built-in
json module in Python.
HyperJSON also provides some additional features, such as the ability to encode and decode JSON with support for NaN and infinity values, which are not supported by the built-in
json module in Python. HyperJSON also provides a more flexible API for working with JSON data, allowing users to customize the encoding and decoding process to better fit their needs.
Overall, HyperJSON is a powerful and efficient library for working with JSON data in Python. Its use of Rust provides significant performance benefits over the built-in
json module, making it a great choice for applications that require fast and efficient JSON encoding and decoding.
In this article, we’ve explored several Python libraries that are written in Rust, or partially written in Rust, and how they can improve the safety and performance of Python code.
- Polars is a data manipulation library that is implemented entirely in Rust, providing high performance and efficient memory management.
- TikToken is OpenAI’s tokenizer, partially written in Rust, that provides a faster and more memory-efficient implementation than the built-in tokenizer in Python’s
- River from Online-ML is a machine learning library that integrates Rust code to provide a faster and more efficient implementation of machine learning models in Python.
- Finally, HyperJSON is a JSON library implemented in Rust that provides faster and more efficient encoding and decoding of JSON data.
Overall, these libraries demonstrate the power and flexibility of Rust as a language for implementing high-performance code that can integrate with Python. By leveraging Rust’s strengths, these libraries provide faster and more efficient implementations of functionality that is critical to many Python applications. As such, these libraries are valuable tools for developers looking to optimize their Python code for speed and efficiency.
Thank you for reading my story!
Subscribe for free to get notified when I published a new story!
…and I would love your feedback!
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI