
AI Erases Men Too: A Visual Test of Bias Across Four Leading Tools
Last Updated on April 16, 2025 by Editorial Team
Author(s): Sophia Banton
Originally published on Towards AI.
AI Erases Men Too: A Visual Test of Bias Across Four Leading Tools

When it comes to identifying bias in AI, sometimes the clearest evidence is in the AI outputs themselves.
Four AI tools. Same prompt. Different results.
Framing the Question: What Does AI Actually See?
When you ask AI to generate an image of a professional group of men with specific traits like age, skin tone, and glasses you expect it to follow the prompt.
But what if it doesn’t?
This wasn’t a request for diversity. I didn’t use words like “inclusive” or “representative.” I simply described the kinds of men I wanted to see, men who reflect the world I live in.
In a previous test focused on women, I found that many of those traits were ignored or overwritten. Glasses disappeared. Hairstyles and features became uniform. Cultural cues such as a bindi or braids were often missing or replaced.
This time, I turned the camera toward men.
Would AI repeat the same patterns? Would it follow the instructions more faithfully or default to a different set of assumptions?
To find out, I gave the same prompt to four major image generators: OpenAI 4o, Microsoft Copilot (DALL·E 3), Midjourney, and Google ImageFX. OpenAI’s newest model was included because it had just been released a few days after I completed the study on women.
The Prompt: Specific Without Saying “Diverse”
I gave each tool the same carefully written prompt. I wasn’t vague. I asked for real men with visible features, including glasses, varied skin tones, different ages, and specific hairstyles. I described what I wanted to see, clearly and simply.
I asked for:
- A group of professional men from different cultural backgrounds
- Specific features like glasses, varied facial structures, and blazer colors
- A white background and white shirts with colored blazers
I used new accounts for each tool to avoid personalization. No re-rolling, no edits. I used only the first result. Because that’s how most people interact with AI. I wasn’t trying to fine-tune. I wanted to see the defaults. I wanted to see the truth again, just as I had done with the images of AI-generated women.
The prompt was clear. The results were not.
Three of the four tools missed important elements. What returned wasn’t just a failure of color accuracy or style. it was a pattern. Skin tones were lightened or flattened. Cultural markers like braids or facial structure were altered or omitted. Details I clearly asked for were skipped.
There were also noticeable differences in how the AI drew and composed the men compared to what I observed when the tools were asked to generate images of women.
Tool by Tool: Who Followed the Prompt and Who Didn’t
OpenAI GPT-4o

GPT-4o delivered a technically correct result: a group of men in coordinated attire on a clean background. The composition included some age and ethnic variation but leaned toward lighter skin tones and similarly styled features. The image conveyed visual order more than individuality. Everyone looked polished, neutral, and composed. It resembled a professional team photo — orderly and conventional, with limited variation across the group.
Notably, the only darker-skinned man was shown with loosely wavy hair, even though the prompt requested braids. This echoes what I observed in the women’s test, where culturally specific traits were often missing or altered. This subtle deviation might seem minor, but it reveals a recurring gap in how AI depicts Black identity. If GPT-4o could render wavy hair but not braids, the omission raises an important question: was it a limitation of the model or a pattern shaped by the kinds of images it has seen most often?
Microsoft Copilot (DALL·E 3)

Copilot followed the basic structure but ignored the integrity of the prompt. It inserted a woman into an all-male prompt and erased entire identity groups, including South Asian and East Asian men. The image felt algorithmically “diverse” but narratively hollow, as if checking boxes without understanding what was asked.
The image leaned into stylized aesthetics: high fashion and modelesque poses. This diluted the prompt’s intent and reinforced a narrow definition of professionalism. These weren’t qualities I requested, but ones inferred by the model from the kinds of cultural images it has seen most often. It applied a look it assumed was appropriate for a professional man, regardless of whether it matched the prompt. In other words, it leaned toward algorithmic beauty.
And that invites a deeper question: what is beauty, exactly? Why am I even describing this output as algorithmically beautiful? I asked for humans, not cultural reminders of where we fit on the beauty scale. I wanted realism. I wanted specificity. But AI filled in the blanks with something we’ve taught it to love: symmetry, smooth skin, polished masculinity. This isn’t beauty in any universal sense. It’s beauty according to decades of advertising, film, and corporate visuals, refined and reinforced by millions of online images we’ve uploaded, liked, and shared. In that context, DALL·E hasn’t just mastered realism. It’s mastered us. It’s learned what we reward and reflect — and now it mirrors it back, perfectly curated, even when we never asked for it.
Midjourney

Midjourney’s response was artistically striking but clearly out of sync with the prompt. It ignored core instructions: the group showed limited age variation and lacked meaningful ethnic diversity, with no clear representation of South Asian or East Asian men, similar to what Copilot produced. Interestingly, every man in the image had a beard or visible aftershadow. The word “beard” never appeared in the prompt. The men appeared not only modelesque but athletic, styled with angular lighting and dramatic shadowing. One wore an earring, while others had long, carefully shaped hair that felt stylized and sensual rather than professional.
This contrast also highlights a broader issue often discussed in relation to women in media — the influence of stylization and physical idealization. However, it’s less commonly acknowledged that AI can apply those same objectifying aesthetics to men. In Midjourney’s rendering, the men do not resemble professionals in a workplace. Instead, they resemble stylized products: posed, physically idealized, and carefully lit. Compared to GPT-4o’s neutral, corporate-style composition, this version appears more performative, placing greater emphasis on visual impact than on realism or identity. Their expressions and postures evoke the tone of a fashion spread or advertisement. In this case, form overtook function. Style took precedence over specificity.
Google ImageFX

ImageFX was the only model that closely followed the prompt. Compared to the highly stylized or defaulted results from other tools, ImageFX grounded its response in realism, range, and detail. It rendered visible diversity in age, ethnicity, attire, and expression. Glasses appeared. Blazer colors matched the request. The men felt real. It was the only tool that depicted an East Asian man. The output didn’t feel curated; it felt human. ImageFX showed that it’s possible for AI to respond faithfully to a prompt without filtering identity through aesthetic bias. The capability is there, but it didn’t show up in the other tools.
Every major trait I requested was represented thoughtfully, with minimal deviation or stylization. However, it’s worth noting that one individual was missing a blazer, and two others wore colored undershirts. This is a minor variation from the coordinated blazer prompt, but one that may reflect subtle stylistic choices by the model. Additionally, the inclusion of a younger white subject alongside the older white man — something that also appeared in the women’s test — raises early questions about which demographics AI tools feel compelled to include for visual familiarity or perceived balance.
When AI Assumes, Omits, and Prioritizes
AI Fills Gaps Based on Familiar Patterns
AI doesn’t just follow prompts. It fills in the gaps based on what it has seen before. When those gaps are shaped by repeated visuals, cultural habits, or missing representation, the results often reflect old patterns without us realizing it. In this test, the AI tools added, erased, altered or simplified details, even when they weren’t asked to. By doing so, they revealed what these tools have learned to prioritize, who they’re comfortable rendering, and who they consistently leave out.
Men Keep Detail, Women Lose It
In every image of men, glasses were rendered correctly. But when I tested the same detail with women, it disappeared consistently. This is not a coincidence.
It’s a clue.
It suggests that the models treat visual cues not as neutral details, but as aesthetic variables filtered through built-in assumptions about attractiveness, gender, and professionalism. Even when instructed to render glasses, the women were made to conform. They were polished, simplified, and stripped of variation. Men were allowed to keep complexity.
This behavior by the AI echoes a familiar message: the one teenage girls have long received — trade your glasses for contacts. Hide what makes you different. We’ve internalized that look. And now AI has, too.
Who Gets Included — And Why
This visual pattern wasn’t limited to how AI drew men. In both the women’s and men’s tests, ImageFX introduced a younger white subject alongside the older woman and man. This raises questions about which demographics AI tools consistently default to including, and why.
Then there’s this: AI is most consistent and confident when rendering white men. OpenAI’s image included at least three. Google ImageFX, the most prompt-aligned model, added an additional white man and centered the older white man in its output. It did the same with the older white woman in the image of women it created. ImageFX also rendered the older white man with more visual detail and emphasis than the others. Additionally, both Midjourney and OpenAI’s GPT-4o placed a white man in the second row, dead center. He appeared to act as a visual anchor.
Braids and Cultural Accuracy
And what about the Black man with braids I asked for?
Only two tools featured braids. With Copilot, they were on a woman, not a man as specified in the prompt. However, Google ImageFX also rendered braids, and they were accurate and well-executed. This shows that at least some models are capable of drawing braids correctly across genders.
This raises a fundamental question: Could the model not render braids on a Black man? Or did it choose not to? Either way, it’s telling. If the model can draw braids but doesn’t, that’s a kind of defiance. If it can’t, that’s a form of ignorance. And both result in erasure.
These aren’t just technical misses. They are patterns: visual defaults that quietly shape who is centered, who is softened, and who disappears.
Revisiting the Women’s Test with GPT-4o

OpenAI’s 4o model was not available when the original women’s test was conducted. However, for fairness, the prompt was retroactively run through the updated model to evaluate how it would handle the same request.
The result is one of the strongest outputs across both tests. The image shows six women of visibly different racial backgrounds, all wearing white shirts and colored blazers. Glasses were included on both older women. Skin tones, facial features, and hairstyles are clearly individualized, and the composition reflects the intent of the prompt with care.
Compared to earlier models, GPT-4o captured the request with surprising precision. It didn’t just recognize diversity — it followed instructions. It was also the only image that returned realism for women instead of glamor. The portraits felt grounded and individualized, with fuller faces, visible age range, and natural expressions. Only two requested details were missing: freckles and a bindi.
But this also raises a new question: why did it follow the instructions so closely for women, but not for men?
Comparing the Two Tests: Women vs. Men

What We Saw in the Women’s Test
In the women’s test, two tools missed the mark while two came very close to honoring the prompt. Overall, the results were stronger than those observed for the men. OpenAI GPT-4o delivered nearly everything that was asked for, though it lacked freckles and a bindi. ImageFX captured every requested detail, including glasses, fuller faces, older women, and cultural cues like a bindi and braids. Copilot fell short, but at least followed the structure. Midjourney erased specificity but stayed within the bounds of stylized professionalism. And every tool clearly returned a group of women.
What We Saw in the Men’s Test
In this test, only ImageFX responded with care. Midjourney ignored the prompt and stylized the men into objects. GPT-4o returned a polished group that leaned toward uniformity and lightness in tone, a stark contrast to its near-perfect execution for the women. Copilot misread the prompt entirely and inserted a woman. The prompt was simpler this time, yet the results were worse.
This wasn’t just inconsistency. It was a pattern of defaulting: to sameness, to assumptions, and to familiar visual conventions about who belongs in the frame.
Interpretation
This was systematic deviation:
- The prompt was simpler (fewer cultural markers).
- The output was worse (less aligned with the prompt, more stylized, more defaults).
- No new tool — not even GPT-4o — performed significantly better.
If AI struggled to render women, it fumbled even more with men. The instructions were clear. The results weren’t. And in this second test, the gap between what I asked for and what I received was even wider.
Beyond Aesthetics: The Real-World Cost of Omission
This is not just about visual outputs. It’s about what AI is being trained to recognize and what it’s being trained to overlook. These tools are used to build workplace graphics, educational content, marketing visuals, and more.
If they miss important details when the prompt is clear, what happens when the prompt is vague? If they leave out identity in a test, what happens at scale?
Bias doesn’t always appear as harm. Sometimes it shows up as absence. As silence. Sometimes it shows up as erasure: a pattern of leaving out people, traits, or identities that don’t match what the system has learned to expect.
A Manifesto for Representation
The patterns we saw in AI-generated images of women have now shown up again in the images of men. This isn’t just about inconsistency. It’s about what AI sees and what it still doesn’t.
If AI is going to help us reflect the world, it has to see the people in it.
Not smooth us out. Not decorate us. Just see us, clearly and completely.
When AI fails to represent us truthfully, it doesn’t just distort pictures, it limits possibility. We have a responsibility to ask clearly for what we want to see — specificity, not stereotypes. Expect better. And build differently. Representation isn’t an option. It’s the baseline.
If AI is the lens, we need to ask who it focuses on and who it leaves out.
AI is the tool. But the vision is human.
About the Author
Sophia Banton is an Associate Director and AI Solution Lead in biopharma, specializing in Responsible AI governance, workplace AI adoption, and strategic integration across IT and business functions.
With a background in bioinformatics, public health, and data science, she brings an interdisciplinary lens to AI implementation — balancing technical execution, ethical design, and business alignment in highly regulated environments. Her writing explores the real-world impact of AI beyond theory, helping organizations adopt AI responsibly and sustainably.
Connect with her on LinkedIn or explore more AI insights on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.