Don’t Frustrate Your Data Scientists (If You Want Them to Stay)
Last Updated on November 3, 2022 by Editorial Team
Author(s): Stu Bailey
As I speak with data scientists, especially those working in Global 1000 companies, many express concerns about their situation. In some sense, they’re victims of their own success: Data scientists are producing models that are making substantial contributions to the business, and thus more and more models are being used in production applications. But as a result, data scientists face several challenges. In my conversations, the following issues come up the most frequently:
- -Their organization lacks visibility to the business contributions being made by the models they produce
- -They’re spending more and more time dealing with operational issues for their models in production
The causes of both issues are very consistent across most organizations, and as such, lend themselves to straightforward solutions. This is good news for both data scientists and the organizations that they work for – provided that organizations act, and do so with some urgency.
“You’re Model is Broken – Let’s Have a Meeting!”
Once developed and deployed into production, AI models can be very sensitive to a host of conditions that can compromise their effectiveness, reduce their value and increase their associated risks. Some of these items, such as data drift, relate directly to the work of the data scientist and require their expertise to address. But there are many other items that can impact models in production that have little to do with data science. For example, a problem with a production data pipeline can cause model outputs to deviate from accepted limits, or even produce erroneous inferences. A problem with the production IT infrastructure in which the model executes can cause performance issues. In many cases, there may be no meaningful role for the data scientist in addressing the problem. But that doesn’t spare them from becoming involved.
In many organizations, the response to a problem with an AI-driven application is to pull together a meeting with representatives from the data team, IT team, DevOps team, compliance team – as well as data science – in the hope of quickly identifying and addressing the root cause. These meetings often conjure the story of the blind men trying to describe an elephant: Each can describe the part of the elephant that they hold, but no one can describe the whole beast. As a result, a lot of time can be wasted – and value lost – as the group tries to assemble a complete picture of the problem and determine a fix.
I’ve yet to encounter a data scientist who isn’t committed to making sure that their model is operating effectively and within its thresholds. What they don’t appreciate is being called into situations in which the problems ultimately had nothing to do with the model. They’re generally fine to play a role in monitoring their models that have reached production, but they don’t want to spend their time chasing issues that they have no role in fixing.
“My Models are Making Big Contributions – Believe Me”
Organizations have been pouring millions into AI initiatives in pursuit of big returns, and for the most mature organizations their investments are generating significant returns. But many organizations struggle to quantify the value that their initiatives are contributing. This is increasingly important as budgets tighten and there are more AI projects competing for funds. This directly impacts data scientists, who want their contributions to be recognized and for appropriate rewards to flow to them and their projects. Of course, a lack of visibility to the contributions of AI models is not just an issue for data scientists: The inability to accurately assess business contributions imperils all enterprise AI initiatives.
ModelOps to the Rescue
ModelOps is a core capability that enables organizations to govern and scale their AI initiatives. An effective ModelOps capability enables an organization to standardize and automate the operational processes for all models in production, but without restricting data scientists or any other team from using the most appropriate tooling and infrastructure for each use case. It also provides the enterprise – senior executives, IT staff, data teams, compliance teams, business teams, and of course data scientists – with business metrics that show the contributions, costs, and ROI of each production model.
The most effective enterprise ModelOps capabilities are built around a platform that is independent of any data science tool, data system or execution infrastructure, but rather integrates with whatever tooling and systems are used across the enterprise, including enterprise systems for security and access management, ticketing, risk management, compliance, etc. The ModelOps platform maintains an evergreen database of all models in production, regardless of origin or execution environment, along with all artifacts including algorithms, training data, approvals and the like. It includes active monitors that continuously checks the full gamut of statistical, ethical, performance, security, business and compliance KPIs, and routes issues to those responsible and tracks resolution – eliminating the need for “hunting trips” to find the root cause of problems and freeing data scientists, and everyone, to focus their time on their core responsibilities. A mature ModelOps platform also integrates with business systems to enable automated generation of model business metrics and ROI.
For those organizations experiencing frustration with the demands of managing models in production and want to further scale their AI initiatives – and retain the best data scientists – there’s an answer: Implement ModelOps today.
Bio: Stu co-founded ModelOp and serves as Chief Enterprise AI Architect. Stu’s background as a technologist and entrepreneur, providing critical data-intensive infrastructure to the world’s largest enterprises, gives him a unique perspective on how to help large, diversified enterprises become AI and Model-Driven.
As the technical lead for the National Center for Data Mining from 1994-2000, Stu played key roles in the development of high-performance computing and distributed machine learning platforms, including the development of the Predictive Model Markup Language (PMML). In 2000 he founded the category defining and market leader Infoblox serving as Chief Technology Officer and Chief Scientist while the company grew to serve 12,000 enterprises, including most of the Global 2000. Infoblox automates critical operational systems required for the large enterprise to effectively deploy and operate Internet and Cloud initiatives. Following Infoblox’s successful IPO and later acquisition by Vista Partners in 2016, Stu co-founded ModelOp to address the now-apparent challenges with governing and operationalizing AI and Model-Driven initiatives at scale.
Stu holds a Bachelor of Science in computer engineering from University of Illinois at Chicago and is a named inventor on over thirty patents relating to distributed data systems and model operations.