Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

How to Use Only 1 Metric in AB Tests
Latest   Machine Learning

How to Use Only 1 Metric in AB Tests

Last Updated on November 4, 2024 by Editorial Team

Author(s): Pavel Zapolskii

Originally published on Towards AI.

The Most Important Number, or About the Main Product Metric

Imagine a product you use every day β€” an online store, a streaming service, or a game. How do companies improve them? What to add, remove, or where to invest more resources? To answer this question in an enterprise setting, there exists a main product metric.

In this article, we will discuss what it is, how it relates to KPIs, conducting intensive A/B tests, and other indicators.

Defining the Main Metric or Acceptance AB Metric

is a function that measures the key value of the product. It embodies the essence of the product and its impact on the business, regardless of the sector β€” whether e-commerce, fintech, or gamedev. For the metric to work, it must possess four important properties:

  1. Quality β€” Indicates how good the product is for the customer. For example, how much time users spend on the site or how often they return (all components that make up retention).
  2. Profitability β€” Don’t forget about money! The product should generate revenue, for example, through ad clicks. The metric should reflect this.
  3. Measurability β€” This property is relevant for large corporations aiming to enhance their expertise in data categories. Therefore, the metric must be easily measurable to conduct A/B tests and make decisions based on it.
  4. Interpretability β€” It is important that the indicator is understandable not only to data specialists but also to the business side. Therefore, it should correlate with KPIs and financial reports. If the metric is difficult to explain, it may lead to incorrect decisions.

Problems with the Key Metric

It is important to understand that the integral metric is primarily relevant for enterprise companies. Due to this specificity, a number of problems can be expected:

  • Multiplicity of Testing: When looking at several indicators simultaneously, the chance of error in A/B testing increases.
  • Low Test Density: There is a need to increase it, especially when simple techniques like CUPED are utilized.
  • Unclear How to Assess Effect in Controversial Situations: For example, if engagement increases but profitability decreases. What to do?

Let’s break down the last situation with an example.

Suppose we have a website where we place advertisements. If we spam it entirely with these ad banners, the revenue metric might increase: people simply cannot avoid clicking on them. However, users who find this annoying will start leaving. As a result, in the long term, the product will lose out. Therefore, it is important to maintain a balance between the profitability of a particular medium and its quality.

The main metric is also useful for maintaining a balance between product quality and monetization during monitoring.

Imagine we rolled out a patch on July 14th, and it turned out to be bad: as a result, we spammed users with offers. This happened because we forgot to conduct an A/B test. On July 20th, we noticed something was wrong when we saw that our metric went out of the confidence interval.

But agree: it would be strange to measure all indicators with confidence intervals and rolling windows. Otherwise, random alerts in the program would be unavoidable.

Thus, even here, the integral sensitive metric is missing.

In Search of a Metric | Why is it Bad to Look at GMV?

GMV (Gross Merchandise Value) is the total value of all goods sold on the platform within a unit of time.

If we choose GMV as the key metric, we can easily come up with solutions that yield short-term results but become strategically unfavorable.

For example, suppose we have a website that sells slippers 🩴. We can pile on many random offers β€” from different stores, with various snippets. Immediately after that, as we expect, people will start buying more because our assortment is richer. In the first month, we will thus increase GMV. However, it will then turn out that the slippers are of poor quality. Therefore, a second conversion for the same customer is unlikely 😞

A strategically sensible solution would be to move towards improving product quality: setting up filtering and other indicators. Yes, we will reduce GMV. But the user won’t encounter, for example, a toys 🔞 on the slipper page.

In Search of the Main Metric | Q&A

What should the sensitivity of the main metric be?

0.5–0.8x relative to a click (i.e., sufficiently sensitive).

What distribution is necessary?

Means are distributed normally or log-normally.

How should the metric be related to ARPU, Retention, and DAU (i.e., financial and product indicators)?

Correlated >0.7 over long distances so that there is no situation where profit or engagement indicators rise while the key metric falls.

Formulation and Training of the Main Metric

How do you find this main metric? It can be seen as a machine learning problem. We take many small metrics β€” clicks, time on site, transactions, and so on β€” and try to create a single function from them that is sensitive to changes in the product. For this, a classic method is used β€” linear regression. If you, like the cat below, want a bit more math, here’s what the formula looks like:

A cat to attract

Such a sensitive metric will be called the NorthStar ✨✨✨

How to Test and Train NorthStar?

Using a dataset of experiments. Usually, it consists of 20% AA tests, 30% improving tests, and 50% degrading tests. I will explain the latter two below.

  • Improving Test: A test where an improvement in all key product metrics is clearly visible or a situation with average indicators at the start but high results after the release (for example, if a feature fundamentally improves the product).
  • Degrading Test: An experiment where we artificially remove a feature from production for a focus group to detect a decline in key indicators. For example, we reduce the quality of page loading or degrade the quality of ML models. Then we observe how dissatisfied the user becomes 😡.

How to verify that the metric works?

  1. Use Cross-Validation

Take 80% of the data for training and 20% for testing. It is important that degrading tests are shown as red (bad) and improving tests as green (good). Also, you need to ensure that the Z-score value is low. This is a measure that helps understand how much the result deviates from the average.

2. Ensure the Metric is Linked to Real Business Indicators

For example, revenue or key performance indicators (KPIs). Imagine you tested the metric and everything looks excellent, but a few months later it turns out that the tests were actually moving in the opposite direction of the company’s goals 😱 Therefore, it is important that the metric correlates well with business indicators.

3.Eliminate the Risk of Overfitting

The metric should not depend too much on a single parameter. Stability is crucial: with slight changes in parameters, the key indicator should fluctuate minimally.

4. *Configure Predictability (Optional)

This is an additional indicator β€” for ML geniuses in large corporations.

There is a method that helps improve the accuracy of A/B tests using predictors β€” indicators that help predict the test outcome. It is important to check how well the metric can be predicted. This means that in synthetic testing, where it is not possible to clearly divide clients into different groups, we would use the causal inference technique to predict the impact of changes.

But be prepared: even after completing all the steps, the metric may be difficult to predict. And this is a signal that something needs to be improved.

The Ideal Model for NorthStar

  1. Based on Linear Regression:

2. All Components of the Metric (clicks, time, conversions) must be strictly positive or strictly negative. For example, if improving one metric (say, an increase in clicks) enhances product quality, it is a positive component. Mark it with a plus. If a person exits the app and does not continue the conversion action β€” minus.

3. The Less Correlated the Components of the Metric, the Better. To better formulate NorthStar, you need to cover the entire business space of your product with a blanket of various indicators. The more distant the corners our metric sees from each other, the more accurately it reflects reality.

When things are running smoothly and your team has time to spare, why not let them do itβ€” and then hit the conference to obtain the glory?

Mr. Zapolskii

So, What’s the Result?

Using the main metric has several advantages.

  1. Intensive A/B Testing: Its sensitivity allows for intensive A/B testing, increasing the intensity of the indicator by 5–7 times.
  2. Easy Interpretability: This is very important for business. It can lighten the load on the analytics team and give managers the ability to make decisions based on understandable data. Moreover, it is easy to draw conclusions about the reasons for the success or failure of a feature.
  3. Protection from Incorrect Decisions: By choosing a working option for the long term instead of β€œspamming the feed with banners,” we protect ourselves from wrong decisions.
Weigh all the criteria for success.

Important! To set up NorthStar, the company must have an analyst who is proficient in machine learning at an advanced level.

Implementation of the NS metric can be hindered by a poorly organized KPI that does not correlate with our key indicator.

Alternatives to the Component Metric, examples

  • GMV β€” for e-commerce. But be cautious with it (as discussed above).
  • A popular metric, for example, in media services is Total View Time β€” the total viewing time. It is good for understanding how much users like the content, but it is too susceptible to seasonality and does not always provide accurate results in tests.

🦸The main product metric is a superhero that always guards the quality and profitability of your product. It helps prevent the chase for money at the expense of the user and allows effective decision-making based on data. Find your main metric, balance quality and profitability, and move forward β€” towards success!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓