Towards AI Can Help your Team Adopt AI: Corporate Training, Consulting, and Talent Solutions.

Publication

Defining Your Product’s North Star Metrics and Leading Indicators
Latest   Machine Learning

Defining Your Product’s North Star Metrics and Leading Indicators

Last Updated on July 17, 2023 by Editorial Team

Author(s): Lisa Cohen

Originally published on Towards AI.

One key role that data science teams play is defining metrics and setting targets for the product and company. Having a clear set of success metrics is a very effective way to communicate the priorities of the organization in a clear and objective way. It’s also an effective tool for scalability because each team can then set their respective goals and innovate autonomously while laddering up to the overall objective. The OKR framework is an effective way to roll this out to the organization, where the KRs track progress that leads to our long-term objective. While metric reporting can be a more repeatable and automated part of the job, there is a lot of data science innovation that goes into the development of good metrics.

Image by Angeles Balaguer from Pixabay

Metric terminology

There are typically two levels of metrics: north star metrics and leading indicators.

North star metric: The north star metric provides a clear vision for how to measure long-term success.

Leading indicators: Leading indicators (also called surrogates or drivers) represent specific actions, which happen on a shorter time scale, and cause long-term growth of the north star metric.

Overall, the north star represents where you want to go, and the leading indicator is how to get there.

Qualities of effective metrics

Effective metrics should be representative, understandable, predictive, sensitive, and include associated guardrails.

North star metrics:

Represent the mission: The north star metric choice and definition provide an opportunity to clearly articulate the vision of the product and organization in an objective, measurable way.

Understandable, simple and inspirational: In order to rally the company around this goal, a simple metric that everyone understands is more impactful than a complicated or obscure one.

Leading indicators:

Predictive of future success: We want the metric to be predictive of future success, to ensure that short-term metric optimization drives long-term desired results.

Sensitive: The metric should be sensitive to change, on the timescale measured. Otherwise, the team will get stuck in metric reviews, not being able to evaluate the impact of their work, or compare the value of different investments, since they all show the same immoveable metric.

Guardrails:

Guardrails: Having the right guardrails in place is a key step to any metric rollout. This helps reduce gamification and protects things like quality and safety, so that these potential tradeoffs don’t get compromised at the expense of metric growth.

Identifying leading indicators

In determining leading indicators, we want to identify engagement actions that are particularly impactful for this specific product, and determine that they causally drive long-term outcomes (so that investing in them will result in the desired end). For example, “emails sent” for an email app, or “social connections” for a social media app. These metrics are actionable for the team, since they can design product experiences that help customers successfully complete these tasks.

In terms of sensitivity, for online products, we’re looking to see that leading indicators change (by statistically significant amounts) on an experiment time basis (1–2wks), and north star metrics move within a quarterly basis (1–3mo). With this time scale, we can use leading indicators to define the overall evaluation criteria (OEC) for experiments and then track north star metrics in monthly or quarterly reviews.

Meta analysis is a very effective way to identify leading indicators, when you have a sufficient history of experiments to review. We can run a regression analysis across past experiments, and identify the top metrics whose changes in an experiment timescale led to long term changes in the north star metric. We’re looking to see that the leading and north star metrics move together over time, with the leading indicator moving first. We assess the long-term impact by shipping with a holdback and then viewing the impact over time. (“Zero” or feature-level holdbacks can also be developed to view the all-time impact for a particular feature area but are less sensitive to recent changes.)

In cases where experimentation isn’t possible or available (i.e., it would be unfair not to treat all top customer accounts or a medical treatment that would take decades to measure), observational analyses offer a great solution to identify causal drivers. DoWhy, DoubleML, and EconML are a few causal inference libraries we can use to develop synthetic controls (i.e., through propensity score matching) in the existing dataset and then evaluate the leading indicator treatment. We can also use causal inference to compare two populations with different outcomes and identify the causal drivers or cohorts which led to that.

Lastly, machine learning can be a scalable technique to identify candidates for leading indicators. For example, if we build a classification model that predicts whether or not a customer will be successful in the long-term metric, we can then analyze the SHAP feature importance of the predictive variables and test their causality. As discussed above, this can be done through experimentation where we run short-term experiments that move the leading indicator and check that the north star metric change follows in the holdback.

User research, using the product, and domain expertise, and analyzing customer journeys of successful (vs churned) users are additional ways to spark ideas on leading indicators to test.

Photo by Anna Nekrashevich: https://www.pexels.com/photo/magnifying-glass-on-top-of-document-6801648/

Magic moments

As a related concept, magic moments are where we identify the “aha” or “wow” moment in product usage (i.e., 7 Facebook friends after 10 days), after which the customer experiences the value and is significantly more likely to retain and grow on the platform. One way to do this is through a cluster analysis of past user behavior and identifying the inflection points in growth and retention. An advantage of data-driven approaches to establish these thresholds is that they mark meaningful inflection points in the user experience (so customers are less likely to be moving above and below the thresholds by chance). As in the previous section, we can use techniques including causal inference and experimentation to verify if these moments truly lead to successful outcomes.

Active use

Similarly, as we’re counting customers, it’s important to have a meaningful definition of active use. This helps provide a clear view of “who is a customer?”, and helps prevent over-counting customer adds and churn for users that were not truly customers yet.

Setting targets

Once we identify our leading indicators and north star metrics, typically, the next step is to determine targets for the goals. Often we will start with a forecast of the metric. The forecast represents where we will end up (within confidence intervals), assuming we continue the current level of investment. Typically, we will set an aspirational goal above that forecast, which represents the level of ambition for accelerated growth. Known events, changing budgets, and market growth levels are additional factors we can take into account.

Meta-analysis can also be a useful input for opportunity sizing by reviewing the distribution levels (max, median, average) that we’ve been able to move these metrics with past feature investments.

Organizing for success

Across the company, different teams may be better positioned than others to move certain metrics. For example, Growth drives adoption, Product experiences can drive engagement, Support drives satisfaction, etc. (Meta-analysis across teams’ experiments can provide a quantitative view of which metrics they’ve been able to most successfully move in the past.)

The leadership team drives portfolio planning to ensure we have the right composition of teams to accomplish the desired goals, and then teams can proceed with their specific focus. As a company grows, there can be successive north star metrics, leading indicators, and guardrails at successive levels, i.e., the company and team levels. We can use the cascading OKR framework to help the team and company levels connect.

Ship criteria

Sometimes an experiment can move two metrics in opposite directions. This can present a “launch or not” dilemma for the team running the experiment. Of course, trying another implementation of the feature that might better accommodate both metrics is the most desirable outcome. However, this may not always be possible. One way to approach this situation is by considering the relative priority of the two metrics. Another way to manage the “tie breaker” is weighing the relative quantities of the two metric changes, and developing ship criteria to quantify acceptable tradeoffs. For example, in order to help “good” (authentic) users easily sign up, how much potential fraud can we allow (for cases that are inconclusive)? Comparing the lifetime value of the good user versus the cost of fraud can help optimize this tradeoff. (Then if we can limit the exposure of the fraudulent sign-up through progressive access, that will help let through more good users as well.) Another tradeoff might be the increased engagement from sharing user notifications versus the number of users who then turn their notifications off. Similarly, adoption versus revenue. Guardrails are a key aspect of this as well; for example, a change that degrades the product performance below user expectations cannot be shipped.

Monitoring production metrics

Over time, it’s good to check in on the effectiveness of your metrics. The north star metric represents the overall mission, so it should be stable and rarely changing. There may be improvements to the productionalization of the metric, as measurement bugs and edge cases are found or the product and feature measurement changes. (Company-level metrics are considered production-level priority and should also have data lineage, data quality monitoring, and SLAs in place.) Still, it’s worth reflecting every 6–12mo to confirm this is still the top priority for the product at this time. Also, if you’re seeing adverse side effects or gamification of the metric, there may be additional guardrails to put in place or edits to the metric definition (i.e., stop counting a particular behavior you don’t want to promote). If you change the metric, make sure to version and update the data catalog. (You can also backfill historical data with the new metric to analyze long-term trends.)

Conclusion

Leading indicators and north star metrics are key aspects of any product development and provide a clear focus for the broader organization. Data science teams play a key role in defining and validating these metrics, which enable customer success.

Related links

Thinking “left to right” (OKRs)
The Role of Product Data Science
Retain more customers by understanding churn
Calculating customer lifetime value: A Python solution
The data scientist toolbelt

Further reading

Measure What Matters: OKRs by John Doerr
Trustworthy Online Controlled Experiments by Kohavi, Tang, Xu

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓