
DNNs vs Traditional Tree-Based Models for E-Commerce Ranking
Last Updated on May 6, 2025 by Editorial Team
Author(s): Nikhilesh Pandey
Originally published on Towards AI.

Context
Having spent nearly a decade building and optimizing advertising systems at Meta, I’ve seen firsthand how the online advertising ecosystem has evolved — from basic impression-based strategies to today’s complex, conversion-optimized platforms. Drawing from that experience, let’s unpack how modern ad ranking systems work and why they’re central to the success of digital advertising.
Online advertising platforms serve multiple objectives, including brand awareness, promotions, conversions, and retargeting. The success of an ad campaign is measured using different metrics, such as ad impressions, ad clicks, and conversions — ranked in increasing order of importance depending on the campaign goal. Correspondingly, different pricing models are used, including CPM (Cost Per Mille), CPC (Cost Per Click), and CPA (Cost Per Acquisition), each aligning with specific advertiser objectives.
E-commerce advertising platforms like DoorDash and Airbnb prioritize conversions over impressions or clicks, making Cost Per Acquisition (CPA) a highly effective pricing model. Unlike CPC (Cost Per Click) or CPM (Cost Per Mille), CPA ensures that advertisers only pay when a desired action — such as a purchase, signup, or subscription — is completed.
This makes post-view conversion rate (CVR) prediction the cornerstone of real-time ad auctions. Accurate CVR predictions are essential for ranking high-quality ads, optimizing auction efficiency, and maintaining a fair marketplace that benefits both advertisers and consumers. A well-optimized CVR model enables platforms to maximize ad relevance, improve user experience, and enhance overall ad ecosystem efficiency.
Why Tree-Based Models No Longer Suffice?
Ad ranking systems for large advertising platforms have evolved beyond legacy approaches like manual scoring functions. In recent years, tree-based models, particularly Gradient-Boosted Decision Trees (GBDT), have been widely adopted due to their efficiency in handling large-scale, high-dimensional, and heterogeneous data. These models delivered significant performance gains, marking one of the most substantial improvements in ad ranking. However, their impact eventually plateaued, revealing inherent limitations in capturing the complexity and diversity of consumer interactions.
As a result, further advancements in e-commerce ad ranking necessitated a shift beyond tree-based models, prompting businesses to undertake a complete overhaul of their ad ranking systems. Companies like Google Play, Airbnb, and DoorDash began exploring Neural Networks as the next frontier. Extensive research and experimentation established Deep Neural Networks (DNNs) as the superior choice due to their advanced representational learning capabilities. DNNs excel at processing large-scale, heterogeneous data, including temporal trends, contextual signals, and multimodal inputs such as text, images, and graphs. Furthermore, they unlock new opportunities for cross-domain knowledge sharing through transfer learning and enable holistic user behavior modeling via multi-task learning (MTL).
However, the transition to deep learning cannot happen in a single, abrupt shift. For large-scale systems that collect and process online signals, it is far more than just a change in model architecture. Instead, this evolution unfolds through a series of iterative advancements, each tackling new challenges and revealing opportunities for innovation. It requires a comprehensive transformation, refining both offline and online processes through continuous experimentation and optimization.
The remaining of this article explores the high-level approach, challenges, and solutions involved in this transformation, using “DoorDash Home Feed Ads” as a case study.
Transition : Execution and Challenges
Step 1: Defining the Baseline
The foundation of any online ads platform relies on two critical components:
- Model Training Service — An offline process that utilizes historical prediction and engagement logs to train model artifacts.
- Ad Exchange Service — An online system that leverages request metadata and trained model artifacts to rank ads for real-time auctions.
Additionally, robust logging mechanisms are essential to capture high-quality data for continuous model training. Migrating models poses challenges across all these areas, making it crucial to establish a strong baseline before transitioning to a new system.
For a successful migration, the new solution must meet the following key performance criteria:
- Model Quality — The newly trained prediction model should perform at least as well as, if not better than, the existing model.
- System Performance — The online query response time should either improve or maintain parity with the current system, ensuring no significant regression that could impact business value.
- System Cost — The additional capacity costs of the new system must be justified by its overall business impact and performance improvements.
A well-defined baseline ensures that the transition is data-driven, minimizing risks while maximizing business value.
Step 2 : Model training and evaluation
Due to the skewness in input feature distributions, normalization layers play a crucial role as a pre-processing step before feeding data into the primary neural network. To optimize resource utilization, these preprocessing tasks can be offloaded to cost-efficient CPU pools, while GPU resources are reserved for more computationally intensive operations, such as matrix multiplications.
Achieving high training throughput requires strategic workload distribution between CPUs and GPUs, ensuring efficient parallelization across the entire pipeline. Well-designed data flow and resource allocation help minimize bottlenecks, maximize hardware efficiency, and accelerate model training.
- Preprocessing: DoorDash leveraged the PyTorch Distributed Data-Parallel (DDP) framework, where parallel data loaders and workers performed computations at different levels of complexity — simpler operations (e.g., mean calculation) were handled by data loaders, while more complex transformations were processed by dedicated workers. The preprocessed data was asynchronously stored for later reuse in model training, reducing redundant computation.
- Model Training: DoorDash adopted the Single Program, Multiple Data (SPMD) paradigm to parallelize GPU execution across training tasks. The framework, built on TorchRec, was designed to maximize overlap between computation and communication, ensuring efficient utilization of GPU resources.
This structured breakdown of pre-processing and model training significantly improved training throughput by minimizing GPU idle time and maximizing hardware efficiency.
Additional Optimizations : Beyond model training, similar efficiency-driven strategies were implemented to:
- Enhance offline model evaluation, creating a feedback loop to continuously refine the training pipeline.
- Ensure alignment between ad model training and ad serving, bridging the gap between offline training and real-time inference for optimal ad performance.
DoorDash utilized Area Under the Curve (AUC) and Normalized Binary Cross Entropy (BCE) as key metrics for evaluating model performance.
Step 3 : Model Evolution Stages
3.1 Adopting Deep Learning Recommendation Models
Running online A/B experiments comparing tree-based models with neural network-based models yielded clear positive results, providing a strong foundation for the migration plan. The next natural step was to explore DNN specific architecture designs and feature engineering opportunities using DoorDash Ads’ logged training data to drive further improvements.
While the generalized strategy of adding sparse features initially showed marginal gains, scaling up the feature set introduced challenges such as severe model overfitting, ultimately widening the gap between offline and online performance. This highlighted the need for deeper analysis and systematic debugging to unlock additional model improvements.
3.2 Deep Personalization
For DoorDash, two key user behavior patterns emerged: (a) a high tendency for repeat purchases, and (b) a reluctance to explore new options. Additionally, external factors — particularly the time of order — were found to play a critical role in user decision-making.
To address these insights, new features were engineered to capture signals such as daypart (time of day), user preferences for stores and dishes, and overall price sensitivity. Pre-trained embeddings were also introduced to mitigate cold-start and data sparsity challenges.
Overall, these enhancements contributed to an approximate 2.8% improvement in conversion rate (CVR).
3.3 Closing the Loop: Bridging the AUC Gap Between Offline Training and Online Serving
After all the improvements, a gap of 4.3% in AUC remained between the offline model evaluation and live online performance. The initial hypothesis pointed to the age of the training data, which was almost three months old by the time the online experiment was launched. However, further analysis showed that the data distribution had remained stable over time, ruling out this explanation.
Deeper investigations identified that discrepancies between online-logged features and offline-joined features were responsible for almost the entire 4.3% AUC gap (measured at 4.25%), establishing this as the primary issue. These discrepancies typically arose from delays in feature data or from stale cached values (“cached residuals”).
To address missing features, DoorDash implemented feature-specific join windows to better accommodate known feature delays. While this solution partially mitigated the problem of cached residuals, it was not sufficient for complete alignment.
To fully address the residual caching issue, the team enabled online logging for the features most susceptible to inconsistencies. However, online logging introduced new trade-offs, notably increased demand on critical system resources. Careful planning and targeted optimizations — such as infrastructure upgrades and parallelizing key processes like bidding and CVR prediction — helped minimize the overhead.
Even with these efforts, there was still a 10% increase in load on the prediction service. Nevertheless, the business gains from improved ad ranking quality and reduced average latency ultimately justified the additional cost.
Conclusion
Deep Neural Networks (DNNs) are emerging as a superior alternative to traditional tree-based models in the evolving landscape of e-commerce ad ranking. Through the case study of DoorDash Ads, we saw how DNNs enabled richer feature representation, deeper personalization, and scalable training architectures. The migration not only improved model accuracy and CVR but also demonstrated that with the right infrastructure and optimization, DNNs can deliver tangible business value. As such, they stand out as a robust foundation for the next generation of ad ranking systems.
References
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.