CAPTCHAs v/s MACHINES: A Bitter Rivalry?

Last Updated on July 20, 2023 by Editorial Team

Author(s): Daksh Trehan

Originally published on Towards AI.

Machine Learning, Cybersecurity

And how to crack CAPTCHA using Machine Learning!

I am kind of amazed by the technology, sometimes, it hooks me to weird-yet-interesting short videos, other times, it asks me to prove, ‘I’m a human!’

You book Flight Tickets, you face CAPTCHA. You create accounts, you face CAPTCHA. You check for plagiarism for your article, CAPTCHA again!

Sometimes, I want to yell, YES! I am a Robot. (well obviously I am a humanU+1F644)

Other times, I wonder who gets all mountains/bikes/fire hydrants/cycles in the first pass?

What is Captcha? And Why do we use it? Are they getting harder?

CAPTCHA stands for Completely Automated Public Turing Test to Tell Computers & Humans Apart.

In the early 21st century, when Yahoo! was blooming, they were afraid that there will be a day when users will write code to create millions of fake accounts to spam. And to stop spammers, a mechanism is needed to differentiate human users from automated scripts.

The required mechanism should be something that can’t be cracked by our computers, but still, they must be able to grade that test. I told you technology is weird-yet-interesting.

At that time, due to the weaker configuration of machines, less exposure to Machine Learning and Python, computers were weak at recognizing texts. But on the other hand, we humans had expertise in text recognition, as, all we do is read texts all day long.

Luis Von Ahn developed CAPTCHA, where Computers were given a random image of text with its answer, and the text would be warped, thus, making it computers difficult to understand it.

The test helped to differentiate between humans and users. But it wasn’t for the long run, soon computers started to learn that warped text and got better at it.

The same problems arose, the computers were too smart to bypass the test, and now with the increase in traffic, a more robust mechanism was required.

Re-CAPTCHA

It was something very similar to CAPTCHA, but now, instead of providing one piece of text, there are two words in the CAPTCHA.

For the first word, Computers know the answer but the second word was pulled randomly from any article/book. It was assumed, that if humans answered the first word right, there is a high possibility another word would be right too!

For the second word, usually, Computers are used to send the same CAPTCHA to many users and check for the majority. But soon, this method got exhausted and computers were yet able to crack Re-CAPTCHA.

They brought this method down so very well that, according to a test conducted by Google, only 33% of times humans conquered Re-CAPTCHA, but AI did it with an accuracy of 99.8%

Re-CAPTCHA(v2)

This time, the approach was different, this time, humans were expected to teach machines about real-world entities.

We all remember Fire Hydrants, Buses, Cycle, Bikes test, right?

When we try to choose the correct image, we are trying to teach the machine what a real-world entity looks like. The input given by us is recorded and is used for self-learning cars to better understand these entities.

But, guess what? AI is getting better at it too!

Re-CAPTCHA(v3)

By this time, humans have lost all hopes and temper to create a robust test.

Now, we are starting to verify the user’s identity based on her behavior. This is a kind of invisible test, of which users are unaware. It is secretly running behind your web pages to determine whether you’re human or a bot.

Privacy is a myth, for sure! U+1F642

The test can track your clicks, your typing speed, your workflow. And based on that it tries to judge. If you show some unusual behavior, that is writing 100s of words of texts in a second, clicking very frequently. It will prompt Re-CAPTCHA(v2) and will ask you to verify.

How Machine Learning cracked CAPTCHA?

Till this time, you must have understood cracking CAPTCHA with Machine Learning isn’t a biggie. All you need to do is built a simple OCR model with the required data.

The training data can be found at Github

The dataset consists of 1040 images.

Visualizing the data

The Model

Training our model

Predicting output

The code can be found at: Solving CAPTCHA using ML

If you like this article, please consider subscribing to my newsletter: Daksh Trehan’s Weekly Newsletter.

Conclusion

Hopefully, this article has given you an insight into the CAPTCHAs.

The work was created as an academic/fun project and doesn’t intend to be used for harmful/malicious purposes.

References:

[1] OCR Model for reading CAPTCHA.

Find me on Web: www.dakshtrehan.com

Follow me at LinkedIn: www.linkedin.com/in/dakshtrehan

Read my Tech blogs: www.dakshtrehan.medium.com

Connect with me at Instagram: www.instagram.com/_daksh_trehan_

Want to learn more?

How is YouTube using AI to recommend videos?
Detecting COVID-19 Using Deep Learning
The Inescapable AI Algorithm: TikTok
GPT-3 Explained to a 5-year old.
Tinder+AI: A perfect Matchmaking?
An insider’s guide to Cartoonization using Machine Learning
How Google made “Hum to Search?”
One-line Magical code to perform EDA!
Give me 5-minutes, I’ll give you a DeepFake!

Cheers

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

CAPTCHAs v/s MACHINES: A Bitter Rivalry?

Author(s): Daksh Trehan

Machine Learning, Cybersecurity

And how to crack CAPTCHA using Machine Learning!

What is Captcha? And Why do we use it? Are they getting harder?

Re-CAPTCHA

Re-CAPTCHA(v2)

Re-CAPTCHA(v3)

How Machine Learning cracked CAPTCHA?

Visualizing the data

The Model

Training our model

Predicting output

Conclusion

References:

Want to learn more?

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

I Used ChatGPT to Count My Calories

Resource-Efficient Fine-Tuning of DeepSeek-R1

TAI #138: OpenAI’s o3-Mini and Deep Research: A New Era of Reasoning Powered Agents?

Text Preprocessing for NLP: A Step-by-Step Guide to Clean Raw Text Data

DeepSeek AI — The Future is Here

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

CAPTCHAs v/s MACHINES: A Bitter Rivalry?

Author(s): Daksh Trehan

Machine Learning, Cybersecurity

And how to crack CAPTCHA using Machine Learning!

What is Captcha? And Why do we use it? Are they getting harder?

Re-CAPTCHA

Re-CAPTCHA(v2)

Re-CAPTCHA(v3)

How Machine Learning cracked CAPTCHA?

Visualizing the data

The Model

Training our model

Predicting output

Conclusion

References:

Want to learn more?

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement