DoodlAI- Build a Real-Time Doodle Recognition AI with CNN

Last Updated on September 29, 2025 by Editorial Team

Author(s): Abinaya Subramaniam

Originally published on Towards AI.

Have you ever wondered if a computer could recognize your doodles of cats, trees, cars, or even clocks, as you draw them? That’s exactly what DoodlAI does. In this blog, I’ll take you step by step through building DoodlAI, a web application that uses deep learning to recognize hand-drawn sketches in real-time.

DoodlAI- Build a Real-Time Doodle Recognition AI with CNN — Image by Author

What is DoodlAI?

DoodlAI is an interactive platform where users can draw sketches, and the AI predicts the category of the drawing instantly. The system uses a Convolutional Neural Network (CNN) which is a type of deep learning model trained on a dataset of doodles to recognize drawings like,

Animals: cat, dog
Objects: car, house, clock
Fruits: apple, banana
Nature: tree

The AI then predicts the drawing in real-time, making it a fun tool to explore deep learning in action.

Explore the Project on GitHub

If you want to see the full code, download the dataset, or try running the project yourself, check out the DoodlAI repository on GitHub:

https://github.com/Abinaya-Subramaniam/DoodlAI

The repository includes:

The complete CNN model code
Data preprocessing scripts
Training and evaluation notebooks
Instructions to run the project locally or in Google Colab

Feel free to clone the repository, experiment with the model, or even contribute improvements!

Now, we’ll go through the steps to build the project

What is CNN?

A Convolutional Neural Network (CNN) is a type of deep learning model designed to process images. Unlike traditional neural networks, CNNs can automatically detect patterns like edges, shapes, and textures without us manually extracting features.

Key components of a CNN:

Convolutional Layers: Apply filters to detect features in images, such as edges or curves.
Pooling Layers: Reduce the size of the image while retaining important information, which helps the network learn efficiently.
Activation Functions (ReLU): Introduce non-linearity so the network can model complex patterns.
Dropout Layers: Randomly disable some neurons during training to prevent overfitting.
Fully Connected Layers: Combine extracted features to classify the image into a category.

In short, CNNs mimic how the human visual system works, starting from simple lines and edges, they build up to complex shapes like a cat’s face or a tree.

Step 1: Setting Up the Environment

Before we build the model, we need to install some libraries. These libraries will help us:

TensorFlow/Keras: Build and train neural networks
NumPy: Handle large arrays of data
Matplotlib: Visualize images and graphs
OpenCV/Pillow: Work with images
Flask-Ngrok: Run web applications from Colab

!pip install tensorflow keras numpy matplotlib opencv-python pillow flask-ngrok

Step 2: Understanding the Data

We need a dataset of doodles to teach our AI. We use the Google QuickDraw dataset, which contains hundreds of thousands of doodles drawn by people around the world.

We focus on 8 categories:

CATEGORIES = ['cat', 'dog', 'house', 'tree', 'car', 'apple', 'banana', 'clock']

Each category contains 28×28 pixel grayscale images, which are tiny black-and-white images perfect for training our model.

Step 3: Downloading and Preprocessing the Data

We need to:

Download the doodle files from Google QuickDraw.
Convert them into arrays the AI can understand.
Normalize the data so values range between 0 and 1.

Here’s what the code does:

def download_quickdraw_data():
 base_url = "https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/"
 data_dir = 'quickdraw_data'
 os.makedirs(data_dir, exist_ok=True)

 X, y = [], []

 for i, category in enumerate(CATEGORIES):
 filename = f"{category.replace(' ', '%20')}.npy"
 filepath = os.path.join(data_dir, filename)
 if not os.path.exists(filepath):
 response = requests.get(base_url + filename, stream=True)
 with open(filepath, 'wb') as f:
 for chunk in response.iter_content(chunk_size=1024):
 if chunk:
 f.write(chunk)

 category_data = np.load(filepath)[:10000]
 category_data = category_data.reshape(-1, 28, 28, 1).astype('float32') / 255.0
 X.append(category_data)
 y.append(np.full(len(category_data), i))

 X = np.vstack(X)
 y = np.hstack(y)
 y = to_categorical(y, num_classes=len(CATEGORIES))

 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
 X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=42, stratify=y_train)

 return X_train, X_val, X_test, y_train, y_val, y_test

X_train, X_val, X_test, y_train, y_val, y_test, ATEGORIES = download_quickdraw_data()

The download_quickdraw_data() function handles the entire process of preparing the Google QuickDraw dataset for training a Convolutional Neural Network (CNN).

First, it downloads doodle files for each category (cat, dog, tree, etc.) if they are not already stored locally. Each doodle image is loaded, limited to 10,000 samples per category, reshaped to a 28×28 pixel grayscale format with a single channel, and normalized so that pixel values fall between 0 and 1.

This normalization helps the CNN learn more efficiently. For each category, a corresponding numeric label is created, and all images and labels are combined into unified arrays suitable for model training.

After preprocessing, the function splits the dataset into training, validation, and test sets using an 80–10–10 split (80% training, 10% validation, 10% test).

Labels are converted to one-hot encoded vectors, which is necessary for multi-class classification with categorical cross-entropy loss. The returned arrays X_train, X_val, X_test, y_train, y_val, y_testare ready for feeding into a CNN, allowing the model to learn patterns from the doodles, validate its performance during training, and finally evaluate accuracy on unseen test data.

Step 4: Visualizing the Doodles

Before training, it’s fun and important to see what we’re working with:

plt.figure(figsize=(12, 6))
for i in range(10):
 plt.subplot(2, 5, i+1)
 plt.imshow(X_train[i].reshape(28, 28), cmap='gray')
 plt.title(CATEGORIES[np.argmax(y_train[i])])
 plt.axis('off')
plt.tight_layout()
plt.show()

You’ll see little sketches of cats, cars, trees, and more.

Step 5: Building the CNN Model

A Convolutional Neural Network (CNN) is a type of AI that is excellent at recognizing images. Think of it like this:

Convolution layers → Detect patterns (lines, curves, shapes)
Pooling layers → Reduce image size to focus on important features
Dropout layers → Prevent overfitting (helps the AI generalize better)
Dense layers → Make the final decision about which category the image belongs to

Here’s our model:

model = create_improved_model(input_shape=(28,28,1), num_classes=len(CATEGORIES))
model.summary()

Our CNN model consists of five convolutional layers, each followed by batch normalization and dropout layers. The convolutional layers act as hierarchical feature extractors, progressively learning from simple patterns like lines and edges in the first layers to more complex shapes and textures in deeper layers.

Batch normalization stabilizes the learning process by normalizing the outputs of each layer, which helps the network train faster and more reliably.

Dropout layers randomly deactivate a fraction of neurons during training, preventing the model from overfitting and ensuring it generalizes well to unseen doodles.

After the convolutional and pooling layers have extracted meaningful features, the network flattens the output into a one dimensional vector and passes it through dense (fully connected) layers. These dense layers integrate all the extracted features to make the final prediction, determining the category of the doodle.

In total, the model has approximately 540,000 trainable parameters, a carefully chosen size that balances computational efficiency and learning capacity. This architecture enables the CNN to effectively learn and differentiate between doodle categories such as cats, cars, trees, and more, while maintaining strong generalization performance on new, unseen drawings.

Step 6: Data Augmentation

When training a neural network, one common challenge is that the model might memorize the exact training images rather than learning the general patterns. This is called overfitting, and it leads to poor performance on new, unseen data. One powerful technique to combat this is data augmentation, which artificially expands the dataset by creating slightly modified versions of existing images.

In our project, we use image transformations such as:

Rotation: The doodle is rotated slightly (e.g., ±10 degrees). This helps the model recognize sketches even if the user draws them at a slight angle.
Zoom: The image is scaled up or down slightly. This ensures that the model can handle doodles of different sizes.
Width and Height Shifts: The doodle is moved slightly left/right or up/down. This prevents the model from being sensitive to the exact placement of the drawing in the canvas.
Shear (Tilt): The image is tilted slightly, simulating minor distortions that might occur when a user draws freely.

datagen = ImageDataGenerator(
 rotation_range=10,
 zoom_range=0.1,
 width_shift_range=0.1,
 height_shift_range=0.1,
 shear_range=0.1
)

This helps the model recognize doodles even if they’re drawn slightly differently.

Step 7: Training the Model

Once the data is preprocessed and the model is defined, the next step is training the CNN. Training means letting the model learn patterns from the doodles by adjusting its internal parameters (weights) to minimize errors in predictions. This process involves feeding the network batches of images and updating the weights using an optimization algorithm like Adam.

To make training more efficient and prevent overfitting, we use callbacks special functions that monitor the training process and take actions automatically,

EarlyStopping: If the model stops improving on the validation set for several epochs, training is halted. This prevents wasting time and reduces overfitting by not training longer than necessary.
ModelCheckpoint: This saves the model’s weights whenever the validation accuracy improves. At the end of training, we have the best performing version of the model saved.
ReduceLROnPlateau: If the model stops improving, this callback reduces the learning rate. A smaller learning rate helps the network make finer adjustments to its weights and escape plateaus during training.

history = model.fit(
 datagen.flow(X_train, y_train, batch_size=128),
 steps_per_epoch=len(X_train)//128,
 epochs=50,
 validation_data=(X_val, y_val),
 callbacks=callbacks
)

After training, we achieve ~95% test accuracy, which is excellent for doodle recognition.

Step 8: Evaluating the Model

After training, the next crucial step is evaluating the CNN to understand how well it can recognize new doodles it has never seen before. This helps us verify that the model has learned meaningful patterns rather than just memorizing the training data.

test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {test_acc:.4f}")

X_test and y_test contain doodles and their corresponding labels that the model hasn’t seen during training.
test_loss indicates how far off the model’s predictions are from the true labels.
test_acc shows the fraction of correct predictions. In our case, the model typically achieves ~95% accuracy, meaning it correctly identifies 95 out of 100 doodles on average.

We also visualize:

Accuracy and Loss over epochs
Confusion matrix to see which categories are confused

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=CATEGORIES, yticklabels=CATEGORIES)

Finally, we display some predictions vs. true labels to see the model in action.

Step 9: Saving the Model

Once happy, we save our trained model for future use:

model.save('best_doodle_model.h5')

You can now load this model anytime to make predictions.

Step 10: What’s Next? Deploying DoodlAI

With the model ready, the next step is building a web application:

Backend: FastAPI + TensorFlow/Keras → Serve predictions in real-time
Frontend: React → Canvas for drawing, game/free draw modes
Deployment: Host on a web server, making it accessible to everyone

Users can then draw doodles on the web, and the AI will instantly predict the category with a confidence score.

Conclusion

DoodlAI is a fun and beginner-friendly project to learn deep learning and AI deployment. You’ll understand:

How CNNs recognize images
How to preprocess and augment data
How to train and evaluate models
How to save and deploy an AI model

It’s an exciting way to combine coding, AI, and creativity!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

DoodlAI- Build a Real-Time Doodle Recognition AI with CNN

Author(s): Abinaya Subramaniam

What is DoodlAI?

Explore the Project on GitHub

What is CNN?

Step 1: Setting Up the Environment

Step 2: Understanding the Data

Step 3: Downloading and Preprocessing the Data

Step 4: Visualizing the Doodles

Step 5: Building the CNN Model

Step 6: Data Augmentation

Step 7: Training the Model

Step 8: Evaluating the Model

Step 9: Saving the Model

Step 10: What’s Next? Deploying DoodlAI

Conclusion

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

DoodlAI- Build a Real-Time Doodle Recognition AI with CNN

Author(s): Abinaya Subramaniam

What is DoodlAI?

Explore the Project on GitHub

What is CNN?

Step 1: Setting Up the Environment

Step 2: Understanding the Data

Step 3: Downloading and Preprocessing the Data

Step 4: Visualizing the Doodles

Step 5: Building the CNN Model

Step 6: Data Augmentation

Step 7: Training the Model

Step 8: Evaluating the Model

Step 9: Saving the Model

Step 10: What’s Next? Deploying DoodlAI

Conclusion

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement