CAPTCHAs v/s MACHINES: A Bitter Rivalry?
Last Updated on July 20, 2023 by Editorial Team
Author(s): Daksh Trehan
Originally published on Towards AI.
Machine Learning, Cybersecurity
And how to crack CAPTCHA using Machine Learning!
I am kind of amazed by the technology, sometimes, it hooks me to weird-yet-interesting short videos, other times, it asks me to prove, βIβm a human!β
You book Flight Tickets, you face CAPTCHA. You create accounts, you face CAPTCHA. You check for plagiarism for your article, CAPTCHA again!
Sometimes, I want to yell, YES! I am a Robot. (well obviously I am a humanU+1F644)
Other times, I wonder who gets all mountains/bikes/fire hydrants/cycles in the first pass?
What is Captcha? And Why do we use it? Are they getting harder?
CAPTCHA stands for Completely Automated Public Turing Test to Tell Computers & Humans Apart.
In the early 21st century, when Yahoo! was blooming, they were afraid that there will be a day when users will write code to create millions of fake accounts to spam. And to stop spammers, a mechanism is needed to differentiate human users from automated scripts.
The required mechanism should be something that canβt be cracked by our computers, but still, they must be able to grade that test. I told you technology is weird-yet-interesting.
At that time, due to the weaker configuration of machines, less exposure to Machine Learning and Python, computers were weak at recognizing texts. But on the other hand, we humans had expertise in text recognition, as, all we do is read texts all day long.
Luis Von Ahn developed CAPTCHA, where Computers were given a random image of text with its answer, and the text would be warped, thus, making it computers difficult to understand it.
The test helped to differentiate between humans and users. But it wasnβt for the long run, soon computers started to learn that warped text and got better at it.
The same problems arose, the computers were too smart to bypass the test, and now with the increase in traffic, a more robust mechanism was required.
Re-CAPTCHA
It was something very similar to CAPTCHA, but now, instead of providing one piece of text, there are two words in the CAPTCHA.
For the first word, Computers know the answer but the second word was pulled randomly from any article/book. It was assumed, that if humans answered the first word right, there is a high possibility another word would be right too!
For the second word, usually, Computers are used to send the same CAPTCHA to many users and check for the majority. But soon, this method got exhausted and computers were yet able to crack Re-CAPTCHA.
They brought this method down so very well that, according to a test conducted by Google, only 33% of times humans conquered Re-CAPTCHA, but AI did it with an accuracy of 99.8%
Re-CAPTCHA(v2)
This time, the approach was different, this time, humans were expected to teach machines about real-world entities.
We all remember Fire Hydrants, Buses, Cycle, Bikes test, right?
When we try to choose the correct image, we are trying to teach the machine what a real-world entity looks like. The input given by us is recorded and is used for self-learning cars to better understand these entities.
But, guess what? AI is getting better at it too!
Re-CAPTCHA(v3)
By this time, humans have lost all hopes and temper to create a robust test.
Now, we are starting to verify the userβs identity based on her behavior. This is a kind of invisible test, of which users are unaware. It is secretly running behind your web pages to determine whether youβre human or a bot.
Privacy is a myth, for sure! U+1F642
The test can track your clicks, your typing speed, your workflow. And based on that it tries to judge. If you show some unusual behavior, that is writing 100s of words of texts in a second, clicking very frequently. It will prompt Re-CAPTCHA(v2) and will ask you to verify.
How Machine Learning cracked CAPTCHA?
Till this time, you must have understood cracking CAPTCHA with Machine Learning isnβt a biggie. All you need to do is built a simple OCR model with the required data.
The training data can be found at Github
The dataset consists of 1040 images.
Visualizing the data
The Model
Training our model
Predicting output
The code can be found at: Solving CAPTCHA using ML
If you like this article, please consider subscribing to my newsletter: Daksh Trehanβs Weekly Newsletter.
Conclusion
Hopefully, this article has given you an insight into the CAPTCHAs.
The work was created as an academic/fun project and doesnβt intend to be used for harmful/malicious purposes.
References:
[1] OCR Model for reading CAPTCHA.
Find me on Web: www.dakshtrehan.com
Follow me at LinkedIn: www.linkedin.com/in/dakshtrehan
Read my Tech blogs: www.dakshtrehan.medium.com
Connect with me at Instagram: www.instagram.com/_daksh_trehan_
Want to learn more?
How is YouTube using AI to recommend videos?
Detecting COVID-19 Using Deep Learning
The Inescapable AI Algorithm: TikTok
GPT-3 Explained to a 5-year old.
Tinder+AI: A perfect Matchmaking?
An insiderβs guide to Cartoonization using Machine Learning
How Google made βHum to Search?β
One-line Magical code to perform EDA!
Give me 5-minutes, Iβll give you a DeepFake!
Cheers
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI