Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

The Dark Side of AI: When Innovation Meets Exploitation
Artificial Intelligence   Latest   Machine Learning

The Dark Side of AI: When Innovation Meets Exploitation

Last Updated on September 9, 2025 by Editorial Team

Author(s): MD. SHARIF ALAM

Originally published on Towards AI.

The Dark Side of AI: When Innovation Meets Exploitation
Figure: Even as I write this piece about the ethical use of data and AI, the header image I used comes from the very kind of dataset practices I’m critiquing. This contradiction highlights how deeply embedded unconsented scraping and reuse of creative work has become in our digital ecosystem.

How the Promise of Artificial Intelligence is Built on a Foundation of Stolen Dreams

“I am utterly disgusted. I can’t watch this stuff and find it interesting. It’s an insult to life itself.” — Hayao Miyazaki, when shown AI-generated animations

The future arrived quietly, not with fanfare, but with the soft hum of servers scraping the internet. Every article you’ve written, every photo you’ve shared, every creative work you’ve poured your soul into — it’s all been harvested, processed, and transformed into training data for artificial intelligence systems that now compete with their very creators.

What we’re witnessing isn’t just a technological revolution; it’s the largest unauthorized appropriation of human creativity in history. And the stories behind this transformation reveal a troubling pattern of contradictions that expose the ethical vacuum at the heart of the AI boom.

Even as I write this piece about the ethical use of data and AI, the header image I used comes from the very kind of dataset practices I’m critiquing. This contradiction highlights how deeply embedded unconsented scraping and reuse of creative work has become in our digital ecosystem.

The Contradictions That Define Our AI Era

1. The Heartbreak of Hayao Miyazaki: When Dreams Become Data

In March 2025, the internet exploded when OpenAI’s GPT-4o began generating near-perfect replicas of Studio Ghibli artwork. Users could now generate Studio Ghibli-style images on the fly, images that weren’t just inspired by the storied animation house but were “straight rips of Ghibli’s style.”

The cruelty of this development becomes clear when you understand Miyazaki’s relationship with AI. In a 2016 documentary, when shown AI-generated animations, Miyazaki called them “an insult to life itself,” expressing his utter disgust with the technology. This wasn’t just aesthetic criticism — it was a fundamental rejection of soulless creation.

Yet here was his life’s work, scraped from the internet and regurgitated by machines that understood nothing of the decades of hand-drawn passion behind every frame. As TechCrunch noted, “The Ghibli situation struck a particularly strong nerve among fans since the studio’s mastermind, Hayao Miyazaki, has been vocal about his hatred for AI-generated artwork.”

The very artist who called AI “an insult to life” had his life’s work stolen to feed the machine he despised.

2. Anthropic’s $1.5 Billion Conscience

Anthropic, the company behind Claude AI, reached a staggering $1.5 billion settlement with authors whose books were used without permission to train their AI models. The company allegedly used pirated digital books from shadow libraries like Library Genesis to train Claude.

A company founded on principles of AI safety and ethics built its flagship product on literary piracy, then paid the largest copyright settlement in history to make it right.

3. The Artists Who Sue Their Own Creations

Karla Ortiz, the concept artist behind iconic characters in Black Panther, Avengers: Infinity War, and Doctor Strange, found herself in the surreal position of suing AI companies that had trained on her own work. In August 2024, a judge advanced copyright claims by artists against AI art generators, declining to dismiss infringement claims.

Getty Images discovered more than 15,000 photos from its library in Stable Diffusion’s training dataset, leading to a major lawsuit. These weren’t obscure images — these were professionally licensed photographs being used without permission or compensation.

The same creative works that established these artists’ careers were being used to train their replacements.

4. The Vanishing Evidence Trail

Perhaps most troubling is the story of OpenAI’s disappearing datasets. Court documents revealed that OpenAI destroyed two large “internet-books corpora” datasets (books1 and books2) that had been used to train GPT-3, with the researchers who compiled them no longer at the company.

The company advocating for AI transparency destroyed the very evidence that could prove the legitimacy of their training data.

5. The Insider’s Lament

Suchir Balaji, a former OpenAI researcher, publicly criticized his former employer’s data collection methods, saying that training models by scraping Internet text violated copyright law and damaged the ecosystem of content creators. He left the company partly due to these ethical concerns.

The very people building these systems concluded they were harmful and walked away.

6. When News Becomes “Fake News”

The Indian news agency ANI sued OpenAI claiming not only that their content was used without permission, but that ChatGPT sometimes attributed fabricated news stories to ANI. This represents perhaps the most insidious contradiction of all.

AI systems trained on real journalism began producing fake news and attributing it to the very outlets whose credibility they were undermining.

7. The Paywalled Paradox

Research suggests that GPT-4o shows unusually high recognition of paywalled O’Reilly book content, much more than older models, indicating that the training data may have included more closed or paid content than OpenAI publicly acknowledged.

Companies that ask us to pay for AI subscriptions allegedly trained their models on content we already pay for.

8. The Privacy Pretense

Italy fined OpenAI €15 million for GDPR violations, finding that the company processed users’ personal data without providing sufficient transparency or legal basis. This wasn’t theoretical harm — real people’s private information was being processed at massive scale.

AI companies promise to enhance our lives while secretly harvesting our data to do so.

9. The Cultural Vandalism of Keith Haring

When someone used AI to “complete” Keith Haring’s intentionally Unfinished Painting, the art world reacted with horror. Critics called it “desecration” of the artist’s legacy. The work was meant to remain unfinished — completing it violated the artist’s intent.

AI doesn’t just copy art; it fundamentally misunderstands the meaning behind it.

10. The Dataset Nobody Wanted to See

The LAION-5B dataset, used to train numerous AI image models, was found to contain links to child sexual abuse material and other illegal content. The discovery led to a “cleaned” version being released, but the damage was done — AI models had potentially been trained on the worst content imaginable.

The datasets powering our “clean” AI tools were contaminated with society’s darkest content.

The Real Question: How Ethical Is YOUR AI?

These stories reveal a pattern that goes beyond simple copyright infringement. They show an industry built on a fundamental disrespect for human creativity, privacy, and consent. But as users, we’re complicit too. Every time we use these tools without questioning their origins, we participate in this system.

Evaluating AI Ethics: A User’s Guide

When using AI tools, ask yourself:

1. Data Provenance

  • Does the company clearly state how their training data was obtained?
  • Did they license content or simply scrape it?
  • Do they respect opt-out signals from creators?

2. Transparency

  • Can you find detailed information about what data was used?
  • Are there audit trails and accountability measures?
  • How do they handle copyright disputes?

3. Creator Rights

  • Does the company have licensing agreements with content creators?
  • Do they offer revenue-sharing with original creators?
  • Can creators easily request the removal of their work?

4. User Safeguards

  • Are there content filters to prevent harmful outputs?
  • Can you verify the sources of generated content?
  • Does the tool provide proper attribution where possible?

The Path Forward

The AI revolution was supposed to augment human creativity, not replace it. It was meant to democratize access to powerful tools, not concentrate wealth in the hands of a few tech giants who built their empires on others’ work.

Thousands of artists have signed statements against AI content scraping, including major figures like Hans Haacke and Deborah Butterfield. The creative community is fighting back, but they need our support.

What You Can Do

  1. Choose Ethical AI Tools: Support companies that properly license their training data
  2. Demand Transparency: Ask tough questions about how AI systems are trained
  3. Support Creators: Use platforms that compensate artists and writers fairly
  4. Advocate for Change: Support legislation like the Generative AI Copyright Disclosure Act
  5. Use Attribution: When using AI tools, acknowledge the human creativity that made them possible

The Bottom Line

The next time you use an AI tool to generate an image, write text, or solve a problem, remember: you’re not just using a neutral technology. You’re participating in an ecosystem that was built on the dreams, sweat, and creativity of millions of human beings — most of whom never consented to have their work used this way.

The question isn’t whether AI will transform our world — it already has. The question is whether we’ll demand that this transformation be built on consent, respect, and fairness, or continue to accept that innovation justifies exploitation.

“I can’t watch this stuff and find it interesting,” Miyazaki said about AI. Perhaps it’s time we stopped finding it so interesting, too, and started demanding something better.

The future of AI doesn’t have to be built on stolen dreams. But changing course requires us to acknowledge the contradictions we’ve been willing to ignore — and demand better from the systems shaping our world.

Sources and References

  1. AP News — Anthropic Settlement
  2. Washington Post — Anthropic Copyright Deal
  3. Reuters — ANI vs OpenAI
  4. TechCrunch — Studio Ghibli AI Controversy
  5. Hollywood Reporter — Artists Sue AI Generators
  6. The Art Newspaper — Artists Statement Against AI
  7. Business Insider — OpenAI Destroyed Datasets
  8. Aftermath — Stop Sharing Ghibli AI
  9. Artnet News — Artists Sue AI Generators
  10. Stanford Internet Observatory — LAION-5B CSAM Study
  11. 404 Media — LAION Datasets Removed
  12. TechCrunch — LAION Cleaned Dataset
  13. FedScoop — LAION Federal Research Risks
  14. TechPolicy.Press — LAION Original Sin
  15. LAION Official Blog — Re-LAION-5B Release

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.