Auditing Predictive A.I. Models for Bias and Fairness
Author(s): Eera Bhatt
Originally published on Towards AI.
Recently, two authors published a paper with guidance for conducting audits on predictive A.I. models to improve their ethics. In our case, an audit is an inspection of a predictive model just to evaluate its fairness.
These authors outline several strategies to make sure that a predictive A.I. model aligns well with standard ethical practices.
Just to be clear, these audits donβt have to be done only by the people who develop a model. Auditors can include workers who develop the model at a company, consultants hired from a different business or any qualified independent auditors who arenβt connected to the company.
Anyway, since these authors published their work for the American Psychological Association, we can probably assume that their work involves β you guessed it β psychology.
In fact, throughout this paper, the authors present 12 key factors to consider when auditing A.I. prediction models just based on human psychology.
The authors divide these 12 components into three categories:
- Model-related components. Source data, design, model development and features, and outputs.
- How information about the model is shared. Written and verbal reports about what the model does and how the algorithm works, including discussions and presentations. The way potential users interpret these model descriptions. An outside partyβs understanding of how the model can be applied usefully.
- Meta-components. How the modelβs development might need to change when it is applied to people of different cultures. Evidence that the model complies with ethical standards that are generally accepted.
For the sake of this blog post, letβs dive into one factor from each of these three categories and explore what they actually mean.
Model-Related: Input Data
As I write articles like these, I should cater my writing to the audience that reads it. Otherwise, I have no idea what topics to cover, how to write about them, and which points to highlight the most. Similarly, when a developer sits down to create a predictive A.I. model β the best ones usually arenβt done in one sitting β they should cater it toward its users.
Training. These developers should be training their model with input data similar to what it will have to deal with later. For instance, if I create a predictive A.I. model to serve patients at a retirement home, I shouldnβt be training it with input images of kids. Otherwise, the model might be great at making health predictions for kids, but the algorithm wonβt know how to deal with older people who have very different bodies. (This is an extremely basic example, but itβs still important.)
Data sources. At the same time, though, acquiring helpful input data can be harder depending on what data is actually available. Some models are based on input data that is directly scraped from the Web, but what if not enough data is available online? Sometimes data has to be collected during trials, or even generated (see my article about synthetic data).
Oftentimes, though, there might be a publicly available dataset online that is similar to what youβre looking for. Before using it, just check for any special permissions that you might need.
Candidly, this component should be thought out very early in the process of developing or improving a model. When using sampling techniques for this kind of data, a solid background in statistics is definitely helpful.
Information-Related: Third-Party Understanding
For a brief context, the authors of this paper consider three main groups of people as possible evaluators of the model:
- First-party: Those who develop the model. What do they think about the predictive modelβs performance?
- Second-party: Those who are directly affected by the model. For instance, if a model is used for the public health of senior citizens, what do they think about the model? How do the modelβs predictions impact them?
- Third-party: Outsidersβsuch as lawyersβwho do not develop the model and arenβt directly affected by it. What other comments or concerns does the general public have about this model?
For all three of these parties, especially the third party, the audit isnβt just about the modelβs accuracy. Even if the modelβs developer thinks that their model will turn out successful, not all the auditors are experienced in popular fields like machine learning. So how do non-developers contribute?
Simple descriptions. For the less technically skilled auditors to make evaluations, they need clear descriptions of exactly how the model will serve its users. For the most part, this explanation comes through detailed reports which break down the modelβs complicated inner workings into a simple process.
Even if a model is fantastic, this step is key. If developers donβt communicate about their work clearly, the second and third-party auditors will barely understand how fair a model is.
Meta-Components: Cultural Context
In the information-related section, we defined each of the three parties of auditors. Recall that the second and third parties donβt develop the model, so they probably arenβt as technically skilled as the first party that created it.
This means the second and third parties might be less willing to express any disagreements with the developers who made decisions about the model.
Think about it β if you donβt know anything about a skill like coding, how much of a strong opinion can you have? Especially when you have no prior knowledge to inform your views.
The second and third parties in these audits donβt have predictive A.I. experience, so if they notice strong signs of cultural bias during the audit, they probably donβt express this authentic opinion, which doesnβt help when evaluating the model.
Authenticity. So when an audit is conducted on a model, we have to encourage every auditor to report their true viewpoints instead of filtering out their own thoughts.
The whole point of having diverse auditors is to evaluate the modelβs fairness in various cultures, and we should communicate this clearly to each auditor.
Conclusion. The authors do a great job describing effective audits. But they also note that although only three parties are discussed in their paper, more varied perspectives can enhance a model even more. For instance, if a predictive A.I. model uses computer vision, it certainly pays off to have an external computer vision expert evaluate the model too!
Until then, letβs appreciate how many more people we can help by just auditing predictive A.I. models for bias and fairness.
Further Reading:
[1] Addressing equity and ethics in artificial intelligence (2024). Available at: https://www.apa.org/monitor/2024/04/addressing-equity-ethics-artificial-intelligence
[2] Auditing the AI Auditors: A Framework for Evaluating Fairness and Bias in High Stakes AI Predictive Models (2024). Available at: https://psycnet.apa.org/fulltext/2022-30899-001.pdf
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI