CAPTCHAs vs. MACHINES: A Bitter Rivalry?
Last Updated on June 23, 2021 by Editorial Team
Author(s): Daksh Trehan
Machine Learning, Cybersecurity
And how to crack CAPTCHA using Machine Learning!
I am kind of amazed by the technology, sometimes, it hooks me to weird-yet-interesting short videos, other times, it asks me to prove, βIβm aΒ human!β
You book Flight Tickets, you face CAPTCHA. You create accounts, you face CAPTCHA. You check for plagiarism for your article, CAPTCHAΒ again!
Sometimes, I want to yell, YES! I am a Robot. (well obviously I am aΒ humanπ)
Other times, I wonder who gets all mountains/bikes/fire hydrants/cycles in the firstΒ pass?
What is Captcha? And Why do we use it? Are they gettingΒ harder?
CAPTCHA stands for Completely Automated Public Turing Test to Tell Computers & HumansΒ Apart.
In the early 21st century, when Yahoo! was blooming, they were afraid that there will be a day when users will write code to create millions of fake accounts to spam. And to stop spammers, a mechanism is needed to differentiate human users from automated scripts.
The required mechanism should be something that canβt be cracked by our computers, but still, they must be able to grade that test. I told you technology is weird-yet-interesting.
At that time, due to the weaker configuration of machines, less exposure to Machine Learning and Python, computers were weak at recognizing texts. But on the other hand, we humans had expertise in text recognition, as, all we do is read texts all dayΒ long.
Luis Von Ahn developed CAPTCHA, where Computers were given a random image of text with its answer, and the text would be warped, thus, making it computers difficult to understand it.
The test helped to differentiate between humans and users. But it wasnβt for the long run, soon computers started to learn that warped text and got better atΒ it.
The same problems arose, the computers were too smart to bypass the test, and now with the increase in traffic, a more robust mechanism was required.
Re-CAPTCHA
It was something very similar to CAPTCHA, but now, instead of providing one piece of text, there are two words in theΒ CAPTCHA.
For the first word, Computers know the answer but the second word was pulled randomly from any article/book. It was assumed, that if humans answered the first word right, there is a high possibility another word would be rightΒ too!
For the second word, usually, Computers are used to send the same CAPTCHA to many users and check for the majority. But soon, this method got exhausted and computers were yet able to crack Re-CAPTCHA.
They brought this method down so very well that, according to a test conducted by Google, only 33% of times humans conquered Re-CAPTCHA, but AI did it with an accuracy ofΒ 99.8%
Re-CAPTCHA(v2)
This time, the approach was different, this time, humans were expected to teach machines about real-world entities.
We all remember Fire Hydrants, Buses, Cycle, Bikes test,Β right?
When we try to choose the correct image, we are trying to teach the machine what a real-world entity looks like. The input given by us is recorded and is used for self-learning cars to better understand these entities.
But, guess what? AI is getting better at itΒ too!
Re-CAPTCHA(v3)
By this time, humans have lost all hopes and temper to create a robustΒ test.
Now, we are starting to verify the userβs identity based on her behavior. This is a kind of invisible test, of which users are unaware. It is secretly running behind your web pages to determine whether youβre human or aΒ bot.
Privacy is a myth, for sure!Β π
The test can track your clicks, your typing speed, your workflow. And based on that it tries to judge. If you show some unusual behavior, that is writing 100s of words of texts in a second, clicking very frequently. It will prompt Re-CAPTCHA(v2) and will ask you toΒ verify.
How Machine Learning crackedΒ CAPTCHA?
Till this time, you must have understood cracking CAPTCHA with Machine Learning isnβt a biggie. All you need to do is built a simple OCR model with the requiredΒ data.
The training data can be found atΒ Github
The dataset consists of 1040Β images.
Visualizing theΒ data
The Model
Training ourΒ model
Predicting output
The code can be found at: Solving CAPTCHA usingΒ ML
If you like this article, please consider subscribing to my newsletter: Daksh Trehanβs Weekly Newsletter.
Conclusion
Hopefully, this article has given you an insight into the CAPTCHAs.
The work was created as an academic/fun project and doesnβt intend to be used for harmful/malicious purposes.
References:
[1] OCR Model for readingΒ CAPTCHA.
Find me on Web: www.dakshtrehan.com
Follow me at LinkedIn: www.linkedin.com/in/dakshtrehan
Read my Tech blogs: www.dakshtrehan.medium.com
Connect with me at Instagram: www.instagram.com/_daksh_trehan_
Want to learnΒ more?
How is YouTube using AI to recommend videos?
Detecting COVID-19 Using Deep Learning
The Inescapable AI Algorithm: TikTok
GPT-3 Explained to a 5-year old.
Tinder+AI: A perfect Matchmaking?
An insiderβs guide to Cartoonization using Machine Learning
How Google made βHum to Search?β
One-line Magical code to perform EDA!
Give me 5-minutes, Iβll give you a DeepFake!
Cheers
CAPTCHAs vs. MACHINES: A Bitter Rivalry? was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI