A new paradigm in MLOps — Building Regulatory Compliant System
Last Updated on July 26, 2023 by Editorial Team
Author(s): Supriya Ghosh
Originally published on Towards AI.
DevOps
All Data Scientists, ML Engineers, Developers, etc. are pretty familiar with MLOps and its Framework. The online platform has a plethora of articles and tutorials on this framework to read and understand. The buzzword is in the heart of every AI expert, researcher, and practitioner.
But whether the below-mentioned concepts sound familiar.
1. What are MLOps for Regulatory Compliant Systems with ML Software embedded?
2. What are its inclusions?
3. How is the framework designed?
Let us understand every piece in detail.
MLOps — A brief
For AI and ML Software, MLOps starts with inception and experimentation and continues till deployment and production. But it does not stop there. It goes on and on with multiple cycles of development, deployment, production, and monitoring along with the CI/CD (Continuous Integration/Continuous Delivery) pipeline. In between, everything is taken care of i.e., data transformations, model optimization, model validation, model registry, versioning, central storage, access controls, governance, etc., and the list goes much beyond.
MLOps embraces automation, monitoring, and governance at all steps of ML software since its inception.
This involves 3 major steps.
1. Identifying and putting together the data for training and developing a model.
2. Experimenting with different models to find the best performing one, and
3. Deploying and using the final model in production.
It encompasses a complex workflow.
Overview of Regulation and Compliance
MLS becomes even more challenging when it has to be adopted and adapted by regulatory-compliant systems and domains involving AI and ML Software. A few examples of such domains are Healthcare, Medical Devices, Aerospace, Defense, Automotive, etc. It requires additional activities to be included within the cycle beyond what is already present and consumes a lot of effort, time, and cost.
It is called adopted and adapted because the MLOps framework is borrowed for such domains to add value but at the same time needs some major modifications to the already sewn/fastened practices and processes in order to embrace requirements for regulatory compliance and certification.
These regulation and certification-centric systems require more plan-driven approaches to avoid hazards to the environment and humans, as well as to mitigate risks in the process.
Regulatory bodies often need time to certify things, whereas software development is all about continuous deliveries and practices and demands quick iterations and increments.
Especially the ML and AI-driven systems demand continuous changes in the areas of application code, the model used for prediction, and the data used to develop the model which poses a different set of challenges towards the verification and validation activities further complicating the regulatory compliance and certification approval process.
So, the right approach is to strike a balance between plan-driven and agility-driven development methods. This ensures that enough attention is paid to practices such as risk management and safety engineering along with other practices.
Regulatory bodies address requirements affecting public interests i.e., their health, safety, environment, etc. They do not focus on the technical solutions but only on the outcomes and hazards. Hence, organizations always face a challenge to demonstrate conformity.
For most organizations, developing a regulatory compliant product means, clearly understanding the applicable regulatory requirements associated and determining a strategy accordingly from the beginning. The strategy is implemented by choosing an appropriate ML model that is deployed to production in a “locked” state. “locked” state is elaborated further in the article. This strategy easily enables verification and validation of the end product.
In this regard, the first and foremost activity is to identify the gaps in current MLOps practices. The gaps reveal specific areas which need focus and modifications in order to comply with regulatory requirements.
Once these gaps are identified, actions are taken to boost the existing pipelines to an appropriate maturity level under the regulatory framework.
Most probable gap areas can be related to the model and its versioning, multiple data sets used for training model and their versioning, and monitoring the output of the model to detect bias and other problems.
The essential part of the framework is the inclusion of risk management and safety engineering along with these existing modification areas.
All of these must be added within one framework under a single umbrella to create regulatory-compliant ML systems.
Since ML and AI systems have the capability to improve by learning from data while in production, this raises the question of the authenticity of ML systems’ autonomous operation in relation to the safety and performance in the regulatory compliant domain. As a result, an approach of “locked” algorithms has been undertaken wherein the system is trained during the development phase, but the ability to improve is disabled when in real-world use/production. While the general-purpose pipeline remains active with the ability to deliver re-trained models throughout the entire application lifecycle in a continuous manner, but once the model is chosen for deployment, it is “locked” after the packaging state, with the monitoring phase limited to the validation of the locked model till regulatory compliances are met.
If there are changes in the model to be considered for deployment and production further, then these re-trained packaged models are “locked” again to undergo validation for regulatory compliances and certification approvals. The process continues till the entire system demonstrates conformity against reference standards and regulations for certification approvals.
Development workflow of the Regulatory Compliant ML and AI System
The flow involves two nested cycles of development tasks.
The first one is the inner cycle consisting of daily development tasks with shorter iterations. The second is the outer cycle consisting of formal tasks and activities required for final compliance and approval of the software release. The different levels of tasks are assigned to different persons according to their role and competence and the tasks are completed as per need asynchronously.
Design requirements are reviewed at the release level. Once these are accepted, architectural design activity is undertaken within the development cycle phase. Architectural design is verified against the requirements, and it is considered further for preparing high-level design and detailed unit design/low-level design. Once the detailed unit design is verified and accepted, development activities along with unit testing, integration testing, and system testing kick-off to verify the system/product.
When all the requirement, design review, and testing activity is accomplished successfully with the required number of development iterations, the work is transferred to the release cycle. In the release cycle, the software release goes through the final level of integration and regression testing along with final reviews, verification, and validation stages for final regulatory approval.
A release decision implies completeness of risk management activities and safety engineering activities, therefore, risk managers, safety officers, and compliance officers are typically involved.
Risk Management and Safety engineering activities are considered as umbrella activities and is included throughout the development process (inner as well as outer cycle). Both starts with the requirement and design initiation phase and continues till end (regulatory compliance and certification obtained).
In parallel, active quality monitoring of the “locked” model performance in production happens constantly.
Also, the automated continuous training pipeline operates inside the controlled environment and fetches new data from the sources. The input data are all well validated. In the case where data inconsistencies are found during the data validation, the pipeline’s execution is halted, and abnormality is resolved using the manual intervention. This helps in keeping the development team’s access restricted only to the continuous training pipeline’s installation and maintenance environment without affecting the locked model and its sequential activities for compliance and certification.
Conclusion
Although this MLOps framework aids in ensuring regulatory compliance as guidelines for the machine learning systems are becoming more and more stringent, the process and framework seem to be still evolving. Hence the framework considered here is not the final one and will remain in the adaptation loop as per requirement. But monitoring and governance are of immense importance. Having strong governance and monitoring in place can make it possible to achieve the goal and keep things on track.
References –
1. Granlund, T., Stirbu, V. & Mikkonen, T. Towards regulatory-compliant MLOps: Oravizio Journey from a Machine Learning Experiment to a Deployed Certified Medical Product. SN COMPUT. SCI. 2, 342 (2021). https://doi.org/10.1007/s42979-021-00726-1
2. https://www.enterprisetimes.co.uk/2020/02/06/why-governance-comes-first-in-mlops/
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI