Towards AI

The leading AI community and content platform focused on making AI accessible to all. Check out our new course platform: https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev

Follow publication

A background showing a captcha, and a person covered in a hoodie | Breaking captcha with machine learning

Member-only story

Cybersecurity, Machine Learning, Technology

Breaking CAPTCHA Using Machine Learning in 0.05 Seconds

Machine learning model breaks CAPTCHA systems on 33 highly visited websites. The concept bases on GANs

Roberto Iriondo
Towards AI
Published in
6 min readDec 19, 2018

--

December 19, 2018, by Roberto Iriondo — Updated May 5, 2020

Everyone despises CAPTCHAs (humans, since bots do not have emotions) — Those annoying images containing hard to read the text, which you have to type in before you can access or do “something” online.

CAPTCHAs (Completely Automated Public Turing tests to tell Computers and Humans Apart) were developed to prevent automatized programs from being mischievous (filling out online forms, accessing restricted files, accessing a website an incredible amount of times, and others) on the world wide web, by verifying that the end-user is “human” and not a bot.

Nevertheless, several attacks on CAPTCHAs have been proposed in the past, but none has been as accurate and fast as the machine learning algorithm presented by a group of researchers from Lancaster University, Northwest University, and Peking University showed below.

Figure 1: Overview of the approach. The researchers first use a small set of non-synthesized CAPTCHAs to train a CAPTCHA synthesizer. (1) the CAPTCHA synthesizer is then used to generate synthetic CAPTCHAs, which at the same time (2) the synthetic CAPTCHAs are used to train a machine learning base solver, (3) which is refined to build a fine-tuned solver of non-synthesized CAPTCHAs. | [1]

One of the first known people to break CAPTCHAs was Adrian Rosebrock, who, in his book “Deep Learning for Computer Vision with Python,” [4] Adrian goes through how he bypassed the CAPTCHA systems on the E-ZPass New York website using machine learning, where he used deep learning to train his model by downloading a large image dataset of CAPTCHA examples to break the CAPTCHA systems.

The main difference between Adrian’s solution and the solution from the research scientists from Lancaster, Northwest, and Peking is that the researchers did not need to download a large dataset of images to break the CAPTCHAs system, au contraire, they used the concept of a generative adversarial network (GAN) to create synthesized CAPTCHAs, along with a small dataset of real CAPTCHAs to create an extremely fast and accurate CAPTCHA solver.

--

--

Published in Towards AI

The leading AI community and content platform focused on making AI accessible to all. Check out our new course platform: https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev

Responses (4)

Write a response