Stacking Results: Alibaba Improves Search Services for Online Shoppers

Last Updated on July 25, 2023 by Editorial Team

Author(s): Alibaba Tech

Originally published on Towards AI.

Academic Alibaba, WWW Series U+007C Towards AI

Experimenting with hierarchical reinforcement learning to obtain remarkable results on customer satisfaction

Stacking Results: Alibaba Improves Search Services for Online Shoppers

This article is part of the Academic Alibaba series and is taken from the WWW 2019 paper entitled “Aggregating E-commerce Search Results from Heterogeneous Sources via Hierarchical Reinforcement Learning” by Ryuichi Takanobu, Tao Zhuang, Minlie Huang, Jun Feng, Haihong Tang, and Bo Zheng. The full paper can be read here.

Search for a type of cuisine in your web browser and you’ll likely reach a results page that starts with a map of restaurants near you. Add a word like “making” to your search, though, and you’ll more likely be presented with an excerpt from a recipe, followed shortly by links to cooking videos.

Like browsers, e-commerce platforms like Taobao generate aggregate search results that vary in type and order based on perceptions about users’ interests. Along with links to product pages, for example, results can include blog posts or topic groups about a type of merchandise, reflecting complex algorithmic decisions about whether to pitch a sale directly or influence a later purchase. To make these choices, systems sort through an enormous pool of candidate items by assigning each item a relevance score. The problem is, the relevance scores for different source types — or “verticals” — are not directly comparable, challenging systems to compile items from heterogenous verticals into pages that will drive sales. Furthermore, where web browsers do this just once per search, e-commerce applications need to select verticals repeatedly for each page of results to effectively meet shoppers’ expectations.

*Results (in Chinese) for an aggregate search of “Dress” on Taobao, with results from the Topic and Blog verticals in slots 2 and 5, respectively.*

Now, researchers at Alibaba have proposed a novel hierarchical reinforcement learning (HRL) model that breaks results selection into two separate tasks for source selection and item presentation, ensuring that only items within each chosen source type need to be ranked. By formulating both tasks as sequential decision problems that learn from user behavior, the model has generated significantly stronger results than predecessors in tests of key metrics using real-time search data from Taobao.

Putting Verticals in Perspective

Recent work studying the value of search results indicates that an aggregate of heterogeneous sources has the benefit of letting users explore various verticals based on preference. However, studies have also shown that presenting irrelevant verticals can quickly generate negative impressions, adding a measure of risk to any effort to introduce variety.

To enhance the selection of verticals, the proposed model introduces a high-level policy that responds to sequential patterns in user behavior. It then passes its choices to a low-level presentation policy that ranks items into stacks one vertical at a time, treating each display position as a slot option designated for the top item in the corresponding stack. As a result of this configuration, the problem of mismatched relevance scores across verticals is avoided entirely. In addition, the model applies a Q network to both policies to minimize divergence and oscillations during the training process and enable the application of constraints that may become necessary for commercial purposes, such as preventing the display of any blog posts in a results page.

*Overview of the item selection process in which slots are filled with items from ranked stacks in verticals A, B, and C*

Trial by Taobao

To evaluate the proposed model’s performance, researchers tested it against competitors using hourly real-time search log data following a methodology called bucket testing — known to many as A/B testing. In preparation, users were randomly hashed into buckets in even numbers and distributions, with each bucket receiving one algorithm to test over a period of two weeks to ensure the statistical stability of results. Key metrics included click-through rate (CTR) per vertical, average dwell time (ADT) in seconds spent considering each vertical, and coverage (COV) in terms of the number of slots a vertical occupied. Additionally, the gross merchandise volume (GMV) for each search service was measured to test its ability to generate revenue.

Compared with several models from rule-based and learning-to-aggregate approaches, the proposed HRL model achieved a far greater CTR improvement over the baseline while holding steady with others’ performance in ADT and COV. Furthermore, the HRL model was the only one to impactfully increase the GMV for product searches — an area where others failed even when they could raise GMV for topic group and blog verticals.、

The full paper can be read on Arxiv.

Alibaba Tech

First hand and in-depth information about Alibaba’s latest technology → Facebook: “Alibaba Tech”. Twitter: “AlibabaTech”.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

Stacking Results: Alibaba Improves Search Services for Online Shoppers

Author(s): Alibaba Tech

Academic Alibaba, WWW Series U+007C Towards AI

Experimenting with hierarchical reinforcement learning to obtain remarkable results on customer satisfaction

Putting Verticals in Perspective

Trial by Taobao

Alibaba Tech

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Stacking Results: Alibaba Improves Search Services for Online Shoppers

Author(s): Alibaba Tech

Academic Alibaba, WWW Series U+007C Towards AI

Experimenting with hierarchical reinforcement learning to obtain remarkable results on customer satisfaction

Putting Verticals in Perspective

Trial by Taobao

Alibaba Tech

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement