Volga — Open-source Feature Engine For Real-time AI — Part 1

Last Updated on April 7, 2024 by Editorial Team

Author(s): Andrey Novitskiy

Originally published on Towards AI.

This is the first part of a 2-post series describing the background and motivation behind Volga. For technical details, see the second part.

TL;DR

Volga is an open-source, self-serve, scalable data/feature calculation engine tailored for modern real-time AI/ML applications. It features convenient Pandas-like API to define data entities, online/offline pipelines and sources, consistent online+offline feature calculation semantics, configurable hot and cold storage, feature lookups, real-time serving and on-demand request-time calculations.

It is built on top of Ray and aims to be a completely standalone and self-serve system removing any heavy-weight dependency on general data processors (Flink, Spark) or cloud-based feature platforms (Tecton.ai, Fennel.ai) and can be run on a laptop or on a 1000-node cluster.

GitHub — https://github.com/anovv/volga

Content:

Real-time ML background and infra challenges
The rise of Feature Stores
3 Types of Features by latency and Computation Engine(s) requirements
The rise of managed Feature Platforms
The Problem
Introducing Volga

Real-time ML background and infra challenges

While the topic of real-time machine learning is pretty vast and is a matter of a separate book, the goal of this article is to give our reader a minimal understanding of infrastructure challenges in this area and demonstrate the problem we are trying to solve (if a reader wants to have a deeper understanding of the subject, Chip Huyen’s blog and Fennel.ai blog are good places to start).

In short, real-time ML is a concept of:

(1) — using streaming (real-time) data to calculate ML model inputs (features) with higher accuracy, lower latency as soon as data arrives, instantly infer results from ML model and

(2) — adapting ML model to changing conditions (data drift, concept drift) via model retraining, online learning, incremental learning.

Our problem space is (1).

Example use cases for real-time ML systems are fraud detection, anomaly/outlier detection, recommendation systems, search personalization, RAG (Retrieval-Augmented Generation) for LLMs, trading systems in financial markets, etc.

While building production-ready real-time ML systems is not an easy task, many practitioners agree that the main challenge of building a complete end-to-end solution is creating and maintaining a robust feature engineering/calculation/storage pipeline. To better understand all nuances of building feature calculation pipelines for real-time ML, you can read these blog posts by Fennel and Chip Huyen. In short, key requirements for a mature production-ready feature calculation system are:

Have feature metadata storage to ensure consistency in sharing and authoring (feature schema/data type, unique name/id, authors/owners, etc.)
Different feature access latency requirements for inference and offline training require different storages: hot (cache, e.g. Redis, Cassandra, RocksDB) and cold (offline, e.g. SQL dbs, data lakes)
Ability to compute features in real-time for online (inference) and offline (training) setting
Consistency between online and offline feature definitions (same code to define online and offline calculation) and calculation semantics (same result when computed on real-time data and on historical data)
Data-scientist-friendly API (preferably Python)
Typical platform requirements: Scalability, Multitenancy support, fault tolerance, etc.

The Rise of Feature Stores

The first iteration to solve these problems was the introduction of Feature Stores. At the time these systems were being introduced, one of the biggest hurdles in ML engineering was the lack of feature standartization — many teams defining custom Python scripts, pieces of Pandas/Spark functions without a centralized data-model and storage lead to bugs and slow development cycle.

A Feature Store is essentially a storage layer providing consistent feature metadata models (schemas, data types, unique ids/names, owners, ranges, lineage) as well as connectors to hot (Redis) and cold (SQL, data lakes) storages and computation engines. Notable example of a feature store is Feast.

While they have found their adoption among users, Feature Stores solved only the first two problems from the above list. Problems regarding managing the compute layer and computation models were still up to the user to solve. You can read more about the differences between Feature Store and Feature Platform at Feast FAQ (https://feast.dev/).

3 Types of Features by Latency and Computation Engine(s) Requirements

To better understand the requirements for a compute layer of a feature platform, let’s look at different feature types based on how they are used and their latency demands. In most current Feature Platforms, there are 3 different types:

Offline Features (other names — Batch, Backfill)

Calculated offline on historical data, used for model training/validation
Example — the average number of purchases a user made in a week, timestamped on each new purchase, recorded for each purchase in a span of the last year.
Latency requirements are dictated by model training process — minutes to days.
Compute Infra — Spark, Bounded Flink streams in some cases, custom engines

Online Features (other names — Streaming, Near Real-time, sometimes referred as Real-time)

Calculated as soon as new data entry arrives, used for model inference.
Example — same as above, but calculated at a specific timestamp received by the system
Latency — milliseconds to seconds
Compute Infra — Flink, Spark Streaming, custom streaming engines

On-Demand Features (other names — Real-time)

Stateless transformations calculated at inference time, used in cases when the calculation is too resource-expensive for streaming or requires data available only at inference time
Examples — call to a third party service (e.g. GPS tracking), query to a meta-model whose outputs are used as inputs for this model, dot product for large embeddings, user’s age at millisecond precision
Latency — milliseconds
Compute Infra — Fleet of Python workers — Celery or custom

As seen from the above, in order to support the whole spectre of feature types the system needs to rely on 2–3 different compute engines and provide a common computation model as well as cross-language (e.g. Flink uses Java API) user-facing interface, data model and API to ensure feature declaration consistency between all feature types and proper computation model to ensure consistency between streaming and batch features.

While possible, it becomes significantly challenging to orchestrate these compute engines in a platform setting, when a single job needs to synchronize scalability requirements, resource isolation, fault tolerance between Spark, Flink and Python Executor Fleet. Building and maintaining such a system requires a dedicated team of engineers and ops and infeasible for most medium sized engineering teams.

The rise of managed Feature Platforms

To solve challenges above, managed Feature Platforms were introduced. In a nutshell, A Feature Platform = Feature Store + Feature Computation Engine(s), where Feature Computation Engine(s) is a combination of different compute engines mentioned above (open-source or in-house), orchestrated together, platformized and served as a PaaS. Notable examples of such platforms are Tecton.ai and Fennel.ai.

Fennel.ai Architecture with custom streaming engine and separate On-Demand workers fleet

From a user perspective, the biggest and obvious problem with managed Feature Platforms is vendor lock-in — they are based on closed-source infra, hard-coded data models, tied to a specific cloud vendor and pricing plan.

The Problem

In the current state of things, if a user wants to set up a real-time feature computation infra, they have the following choices:

Outsource to an existing managed Feature Platform which leads to a vendor lock-in (Tecton.ai, Fennel.ai, FeatureForm, Chalk.ai, etc.)
Build in-house infra using existing Feature Stores that rely on multiple compute engines + build an API and computation model across different languages (Java + Python) to unify online+offline pipelines, build unified orchestration layer. Extremely high development and operational cost, especially in platform scenarios (synchronizing Spark + Flink job scaling, isolation and fault-tolerance in a multitenant setting is a hell of a task).
Some cloud providers allow on-premise setup (e.g. Fennel) which is a combination of two above — maintenance of in-house closed source infra and provider’s cloud resources + vendor lock-in to provider’s cloud of choice (e.g. AWS in Fennel’s Deployment Model)

As a result, there are currently no open-source, self-serve feature calculation engines that provide single, unified, consistent computation layer for all feature types and solve all of the above needs (online/offline consistency, single computation platform, ease of scalability and multitenancy, cloud-agnostic, easy local prototyping, mature cloud integration).

Introducing Volga

Volga is designed to solve the problems above. It aims to provide a single computation layer for all feature types as well as simple and unified user facing API to define online/offline/on-demand pipelines in a consistent manner along with an ease of experimentation, deployment and maintenance.

Volga is built on top of Ray — a powerful distributed runtime that has deep and mature integration with modern cluster managers: native Kuberenetes with KubeRay, native AWS support on bare EC2 instances or fully-managed solution — Anyscale, easy local setup, developer-friendly Python API, large and mature community as well as AI first ecosystem that can be used to build end-to-end real-time ML platform.

Volga’s heart is it’s custom streaming engine built with Ray Actors, ZeroMQ for messaging and some Rust for perf-critical parts.

Volga’s API is inspired by Fennel and lets users define strongly-typed data schemas, online/offline/on-demand pipelines, sources and transformations.

Features:

Utilizes custom scalable stream processing engine using Ray Actors for execution, ZeroMQ for messaging and Rust for some perf-critical parts (exeprimental). Kappa architecture — no Flink or Spark
Built on top of Ray — Easily integrates with Ray ecosystem (cluster/job/cloud management, model training/serving, zero-copy data transfers, etc.) as well as your custom ML infrastructure
Kubernetes ready, no vendor lock-in — use KubeRay to run multitenant scalable jobs or create your own deployment/scheduling logic in pure Python
Pure Python, no heavy JVM setups — minimal setup and maintenance efforts in production
Standalone and self-serve — launch on your laptop or a cluster with no heavy-weight external dependencies
Cold + Hot storage model allows for customizable offline storage and caches, fast real-time feature queries
Easy to use declarative Pandas-like API to simultaneously define online and offline feature pipelines, including operators like transform, filter, join, groupby/aggregate, drop, etc.
[Experimental] Perform heavy embedding dot products, query meta-models or simply calculate users age in milliseconds at request time using On-Demand Features

The second part of the series dives into Volga’s architecture, API design and other technical details.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Volga — Open-source Feature Engine For Real-time AI — Part 1

Author(s): Andrey Novitskiy

TL;DR

Real-time ML background and infra challenges

The Rise of Feature Stores

3 Types of Features by Latency and Computation Engine(s) Requirements

The rise of managed Feature Platforms

The Problem

Introducing Volga

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Volga — Open-source Feature Engine For Real-time AI — Part 1

Author(s): Andrey Novitskiy

TL;DR

Real-time ML background and infra challenges

The Rise of Feature Stores

3 Types of Features by Latency and Computation Engine(s) Requirements

The rise of managed Feature Platforms

The Problem

Introducing Volga

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement