Stacking Results: Alibaba Improves Search Services for Online Shoppers
Last Updated on July 25, 2023 by Editorial Team
Author(s): Alibaba Tech
Originally published on Towards AI.
Academic Alibaba, WWW Series U+007C Towards AI
Experimenting with hierarchical reinforcement learning to obtain remarkable results on customer satisfaction
This article is part of the Academic Alibaba series and is taken from the WWW 2019 paper entitled βAggregating E-commerce Search Results from Heterogeneous Sources via Hierarchical Reinforcement Learningβ by Ryuichi Takanobu, Tao Zhuang, Minlie Huang, Jun Feng, Haihong Tang, and Bo Zheng. The full paper can be read here.
Search for a type of cuisine in your web browser and youβll likely reach a results page that starts with a map of restaurants near you. Add a word like βmakingβ to your search, though, and youβll more likely be presented with an excerpt from a recipe, followed shortly by links to cooking videos.
Like browsers, e-commerce platforms like Taobao generate aggregate search results that vary in type and order based on perceptions about usersβ interests. Along with links to product pages, for example, results can include blog posts or topic groups about a type of merchandise, reflecting complex algorithmic decisions about whether to pitch a sale directly or influence a later purchase. To make these choices, systems sort through an enormous pool of candidate items by assigning each item a relevance score. The problem is, the relevance scores for different source types β or βverticalsβ β are not directly comparable, challenging systems to compile items from heterogenous verticals into pages that will drive sales. Furthermore, where web browsers do this just once per search, e-commerce applications need to select verticals repeatedly for each page of results to effectively meet shoppersβ expectations.
Now, researchers at Alibaba have proposed a novel hierarchical reinforcement learning (HRL) model that breaks results selection into two separate tasks for source selection and item presentation, ensuring that only items within each chosen source type need to be ranked. By formulating both tasks as sequential decision problems that learn from user behavior, the model has generated significantly stronger results than predecessors in tests of key metrics using real-time search data from Taobao.
Putting Verticals in Perspective
Recent work studying the value of search results indicates that an aggregate of heterogeneous sources has the benefit of letting users explore various verticals based on preference. However, studies have also shown that presenting irrelevant verticals can quickly generate negative impressions, adding a measure of risk to any effort to introduce variety.
To enhance the selection of verticals, the proposed model introduces a high-level policy that responds to sequential patterns in user behavior. It then passes its choices to a low-level presentation policy that ranks items into stacks one vertical at a time, treating each display position as a slot option designated for the top item in the corresponding stack. As a result of this configuration, the problem of mismatched relevance scores across verticals is avoided entirely. In addition, the model applies a Q network to both policies to minimize divergence and oscillations during the training process and enable the application of constraints that may become necessary for commercial purposes, such as preventing the display of any blog posts in a results page.
Trial by Taobao
To evaluate the proposed modelβs performance, researchers tested it against competitors using hourly real-time search log data following a methodology called bucket testing β known to many as A/B testing. In preparation, users were randomly hashed into buckets in even numbers and distributions, with each bucket receiving one algorithm to test over a period of two weeks to ensure the statistical stability of results. Key metrics included click-through rate (CTR) per vertical, average dwell time (ADT) in seconds spent considering each vertical, and coverage (COV) in terms of the number of slots a vertical occupied. Additionally, the gross merchandise volume (GMV) for each search service was measured to test its ability to generate revenue.
Compared with several models from rule-based and learning-to-aggregate approaches, the proposed HRL model achieved a far greater CTR improvement over the baseline while holding steady with othersβ performance in ADT and COV. Furthermore, the HRL model was the only one to impactfully increase the GMV for product searches β an area where others failed even when they could raise GMV for topic group and blog verticals.γ
The full paper can be read on Arxiv.
Alibaba Tech
First hand and in-depth information about Alibabaβs latest technology β Facebook: βAlibaba Techβ. Twitter: βAlibabaTechβ.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI