Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
DoodlAI- Build a Real-Time Doodle Recognition AI with CNN
Artificial Intelligence   Computer Vision   Latest   Machine Learning

DoodlAI- Build a Real-Time Doodle Recognition AI with CNN

Last Updated on September 29, 2025 by Editorial Team

Author(s): Abinaya Subramaniam

Originally published on Towards AI.

Have you ever wondered if a computer could recognize your doodles of cats, trees, cars, or even clocks, as you draw them? That’s exactly what DoodlAI does. In this blog, I’ll take you step by step through building DoodlAI, a web application that uses deep learning to recognize hand-drawn sketches in real-time.

DoodlAI- Build a Real-Time Doodle Recognition AI with CNN
Image by Author

What is DoodlAI?

DoodlAI is an interactive platform where users can draw sketches, and the AI predicts the category of the drawing instantly. The system uses a Convolutional Neural Network (CNN) which is a type of deep learning model trained on a dataset of doodles to recognize drawings like,

  • Animals: cat, dog
  • Objects: car, house, clock
  • Fruits: apple, banana
  • Nature: tree

The AI then predicts the drawing in real-time, making it a fun tool to explore deep learning in action.

DoodlAI — Interface — Image by Author

Explore the Project on GitHub

If you want to see the full code, download the dataset, or try running the project yourself, check out the DoodlAI repository on GitHub:

https://github.com/Abinaya-Subramaniam/DoodlAI

The repository includes:

  • The complete CNN model code
  • Data preprocessing scripts
  • Training and evaluation notebooks
  • Instructions to run the project locally or in Google Colab

Feel free to clone the repository, experiment with the model, or even contribute improvements!

Now, we’ll go through the steps to build the project

What is CNN?

A Convolutional Neural Network (CNN) is a type of deep learning model designed to process images. Unlike traditional neural networks, CNNs can automatically detect patterns like edges, shapes, and textures without us manually extracting features.

Key components of a CNN:

  1. Convolutional Layers: Apply filters to detect features in images, such as edges or curves.
  2. Pooling Layers: Reduce the size of the image while retaining important information, which helps the network learn efficiently.
  3. Activation Functions (ReLU): Introduce non-linearity so the network can model complex patterns.
  4. Dropout Layers: Randomly disable some neurons during training to prevent overfitting.
  5. Fully Connected Layers: Combine extracted features to classify the image into a category.

In short, CNNs mimic how the human visual system works, starting from simple lines and edges, they build up to complex shapes like a cat’s face or a tree.

Step 1: Setting Up the Environment

Before we build the model, we need to install some libraries. These libraries will help us:

  • TensorFlow/Keras: Build and train neural networks
  • NumPy: Handle large arrays of data
  • Matplotlib: Visualize images and graphs
  • OpenCV/Pillow: Work with images
  • Flask-Ngrok: Run web applications from Colab
!pip install tensorflow keras numpy matplotlib opencv-python pillow flask-ngrok

Step 2: Understanding the Data

We need a dataset of doodles to teach our AI. We use the Google QuickDraw dataset, which contains hundreds of thousands of doodles drawn by people around the world.

We focus on 8 categories:

CATEGORIES = ['cat', 'dog', 'house', 'tree', 'car', 'apple', 'banana', 'clock']

Each category contains 28×28 pixel grayscale images, which are tiny black-and-white images perfect for training our model.

Step 3: Downloading and Preprocessing the Data

We need to:

  1. Download the doodle files from Google QuickDraw.
  2. Convert them into arrays the AI can understand.
  3. Normalize the data so values range between 0 and 1.

Here’s what the code does:

def download_quickdraw_data():
base_url = "https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/"
data_dir = 'quickdraw_data'
os.makedirs(data_dir, exist_ok=True)

X, y = [], []

for i, category in enumerate(CATEGORIES):
filename = f"{category.replace(' ', '%20')}.npy"
filepath = os.path.join(data_dir, filename)
if not os.path.exists(filepath):
response = requests.get(base_url + filename, stream=True)
with open(filepath, 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)

category_data = np.load(filepath)[:10000]
category_data = category_data.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X.append(category_data)
y.append(np.full(len(category_data), i))

X = np.vstack(X)
y = np.hstack(y)
y = to_categorical(y, num_classes=len(CATEGORIES))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=42, stratify=y_train)

return X_train, X_val, X_test, y_train, y_val, y_test
X_train, X_val, X_test, y_train, y_val, y_test, ATEGORIES = download_quickdraw_data()

The download_quickdraw_data() function handles the entire process of preparing the Google QuickDraw dataset for training a Convolutional Neural Network (CNN).

First, it downloads doodle files for each category (cat, dog, tree, etc.) if they are not already stored locally. Each doodle image is loaded, limited to 10,000 samples per category, reshaped to a 28×28 pixel grayscale format with a single channel, and normalized so that pixel values fall between 0 and 1.

This normalization helps the CNN learn more efficiently. For each category, a corresponding numeric label is created, and all images and labels are combined into unified arrays suitable for model training.

After preprocessing, the function splits the dataset into training, validation, and test sets using an 80–10–10 split (80% training, 10% validation, 10% test).

Labels are converted to one-hot encoded vectors, which is necessary for multi-class classification with categorical cross-entropy loss. The returned arrays X_train, X_val, X_test, y_train, y_val, y_testare ready for feeding into a CNN, allowing the model to learn patterns from the doodles, validate its performance during training, and finally evaluate accuracy on unseen test data.

Step 4: Visualizing the Doodles

Before training, it’s fun and important to see what we’re working with:

plt.figure(figsize=(12, 6))
for i in range(10):
plt.subplot(2, 5, i+1)
plt.imshow(X_train[i].reshape(28, 28), cmap='gray')
plt.title(CATEGORIES[np.argmax(y_train[i])])
plt.axis('off')
plt.tight_layout()
plt.show()

You’ll see little sketches of cats, cars, trees, and more.

Image by Author

Step 5: Building the CNN Model

A Convolutional Neural Network (CNN) is a type of AI that is excellent at recognizing images. Think of it like this:

  1. Convolution layers → Detect patterns (lines, curves, shapes)
  2. Pooling layers → Reduce image size to focus on important features
  3. Dropout layers → Prevent overfitting (helps the AI generalize better)
  4. Dense layers → Make the final decision about which category the image belongs to

Here’s our model:

model = create_improved_model(input_shape=(28,28,1), num_classes=len(CATEGORIES))
model.summary()

Our CNN model consists of five convolutional layers, each followed by batch normalization and dropout layers. The convolutional layers act as hierarchical feature extractors, progressively learning from simple patterns like lines and edges in the first layers to more complex shapes and textures in deeper layers.

Batch normalization stabilizes the learning process by normalizing the outputs of each layer, which helps the network train faster and more reliably.

Dropout layers randomly deactivate a fraction of neurons during training, preventing the model from overfitting and ensuring it generalizes well to unseen doodles.

After the convolutional and pooling layers have extracted meaningful features, the network flattens the output into a one dimensional vector and passes it through dense (fully connected) layers. These dense layers integrate all the extracted features to make the final prediction, determining the category of the doodle.

In total, the model has approximately 540,000 trainable parameters, a carefully chosen size that balances computational efficiency and learning capacity. This architecture enables the CNN to effectively learn and differentiate between doodle categories such as cats, cars, trees, and more, while maintaining strong generalization performance on new, unseen drawings.

Step 6: Data Augmentation

When training a neural network, one common challenge is that the model might memorize the exact training images rather than learning the general patterns. This is called overfitting, and it leads to poor performance on new, unseen data. One powerful technique to combat this is data augmentation, which artificially expands the dataset by creating slightly modified versions of existing images.

In our project, we use image transformations such as:

  • Rotation: The doodle is rotated slightly (e.g., ±10 degrees). This helps the model recognize sketches even if the user draws them at a slight angle.
  • Zoom: The image is scaled up or down slightly. This ensures that the model can handle doodles of different sizes.
  • Width and Height Shifts: The doodle is moved slightly left/right or up/down. This prevents the model from being sensitive to the exact placement of the drawing in the canvas.
  • Shear (Tilt): The image is tilted slightly, simulating minor distortions that might occur when a user draws freely.
datagen = ImageDataGenerator(
rotation_range=10,
zoom_range=0.1,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1
)

This helps the model recognize doodles even if they’re drawn slightly differently.

Step 7: Training the Model

Once the data is preprocessed and the model is defined, the next step is training the CNN. Training means letting the model learn patterns from the doodles by adjusting its internal parameters (weights) to minimize errors in predictions. This process involves feeding the network batches of images and updating the weights using an optimization algorithm like Adam.

To make training more efficient and prevent overfitting, we use callbacks special functions that monitor the training process and take actions automatically,

  • EarlyStopping: If the model stops improving on the validation set for several epochs, training is halted. This prevents wasting time and reduces overfitting by not training longer than necessary.
  • ModelCheckpoint: This saves the model’s weights whenever the validation accuracy improves. At the end of training, we have the best performing version of the model saved.
  • ReduceLROnPlateau: If the model stops improving, this callback reduces the learning rate. A smaller learning rate helps the network make finer adjustments to its weights and escape plateaus during training.
history = model.fit(
datagen.flow(X_train, y_train, batch_size=128),
steps_per_epoch=len(X_train)//128,
epochs=50,
validation_data=(X_val, y_val),
callbacks=callbacks
)

After training, we achieve ~95% test accuracy, which is excellent for doodle recognition.

Step 8: Evaluating the Model

After training, the next crucial step is evaluating the CNN to understand how well it can recognize new doodles it has never seen before. This helps us verify that the model has learned meaningful patterns rather than just memorizing the training data.

test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {test_acc:.4f}")
  • X_test and y_test contain doodles and their corresponding labels that the model hasn’t seen during training.
  • test_loss indicates how far off the model’s predictions are from the true labels.
  • test_acc shows the fraction of correct predictions. In our case, the model typically achieves ~95% accuracy, meaning it correctly identifies 95 out of 100 doodles on average.

We also visualize:

  • Accuracy and Loss over epochs
  • Confusion matrix to see which categories are confused
Evaluation — Image by Author
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=CATEGORIES, yticklabels=CATEGORIES)
Confusion Matrix — Image by Author

Finally, we display some predictions vs. true labels to see the model in action.

Predictions — Image by Author

Step 9: Saving the Model

Once happy, we save our trained model for future use:

model.save('best_doodle_model.h5')

You can now load this model anytime to make predictions.

Step 10: What’s Next? Deploying DoodlAI

With the model ready, the next step is building a web application:

  1. Backend: FastAPI + TensorFlow/Keras → Serve predictions in real-time
  2. Frontend: React → Canvas for drawing, game/free draw modes
  3. Deployment: Host on a web server, making it accessible to everyone

Users can then draw doodles on the web, and the AI will instantly predict the category with a confidence score.

DoodlAI — Image by Author
Predictions — Image by Author

Conclusion

DoodlAI is a fun and beginner-friendly project to learn deep learning and AI deployment. You’ll understand:

  • How CNNs recognize images
  • How to preprocess and augment data
  • How to train and evaluate models
  • How to save and deploy an AI model

It’s an exciting way to combine coding, AI, and creativity!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.