Towards AI Can Help your Team Adopt AI: Corporate Training, Consulting, and Talent Solutions.


Build Your First AI Project With a RaspberryPi: An Introduction to Computer Vision and ML Models With RaspberryPi for Python Coders.
Latest   Machine Learning

Build Your First AI Project With a RaspberryPi: An Introduction to Computer Vision and ML Models With RaspberryPi for Python Coders.

Last Updated on January 10, 2024 by Editorial Team

Author(s): Deepali Gaur

Originally published on Towards AI.

Combining physical computing with computer vision and ML algorithms alleviates the possibilities of tasks an AI system can perform. And thus, I started this project with the idea:

‘I wanted to build an AI physical system, something that I could control by just wiggling my fingers in air.’

I combined the IOT capabilities of Pi with computer vision and a trained ML model from mediapipe to build these AI drums. We can conduct music and light show by wiggling our fingers in the air, each finger being synced to a unique musical note and a unique light. I chose hand_landmarker ‘task’, which is a combination of two trained ML models from mediapipe for gesture control. I made this project with a Raspberry Pi. The GPIO pins on a Pi give you the ability to extend the functionality; you could control motors, automate stuff around your house with hand gestures, could draw curtains with the bend of your pointer from another room or sync up lights in every room to a unique gesture.

So, either if you are an adult Python coder or a high schooler, here is a project to get you started with building your first AI system.

AI drums with RaspberryPi

Project Introduction

These AI drums can be played by wiggling your fingers in the air. The Bend of every fingertip plays a unique drum beat. This project works by tracking hands through Computer vision and recognizing hand gestures through a pre-trained machine-learning model. I’ve used mediapipe machine learning model for hand_landmark recognition and OpenCV for Computer vision. With this model, we track the position of each fingertip with respect to the position of the other fingers and sync up a drum beat through our code with each fingertip. I’ve linked every finger to a unique drum beat and a unique LED. And we can play a symphony and control the light show by just wiggling our fingers in the air.

I have synced it to tabla-drums .WAV files, you can sync it to any musical instrument of your choice.

LEDs are a starter point to demonstrate this functionality, you can use GPIO pins to control other electronics like servo motors, DC motors, sensors and so on, through your gestures.

Mediapipe hand_landmarker model bundle is pre-trained, which means it’s already trained on thousands of hand images and is perfect for a beginner project. Link to mediapipe documentation. With this, we extract the positions of hand landmark coordinates from the model and code some logic to detect custom gestures.

I had a chance to test it with more than a hundred visitors at a recent maker faire, this instrument is sure to amuse family and friends.

Prerequisites: Intermediate Python coding skills.

Here’s a demo clip:

AI air drums with RaspberryPi


  1. RaspberryPi 4B with the latest version of Raspian OS installed.
  2. You’ll need a monitor, a wired keyboard and a mouse with RaspberryPi. If you are familiar with VNC/remote access of Pi, you won’t need these.
  3. RaspberryPi camera
  4. A RaspberryPi camera tripod to mount your camera.
  5. You’ll need audio output — either external speakers can be connected to Raspberry Pi’s audio port or you can use the built in monitor speakers. Here’s the link to basic PC speakers that I used.
  6. Optional add on for light show: To add a light show synced to your musical notes you’ll need: Breadboard, LED lights, jumper wires, 220 Ohms resistor. The LED buttons that I used can be purchased from this link.

Set up hardware and software

  1. Make sure you are using the latest version of RaspberryPi OS 64-bit. If you are new to Raspberry Pi, you can follow the detailed steps on setting up your Pi from their official link
  2. Next, you need to connect the camera to RaspberryPi and make sure your camera is enabled On RaspberryPi desktop, Go to Main menu-> Preferences-> RaspberryPi Configuration-> Interfaces -> Camera -> select Enabled -> OK. And then reboot your RaspberryPi.
  3. Check that you have chosen the correct audio channel.
  4. Install opencv, there are several steps for this process, this guide by Dr. Jolle Jolles is a good one to follow .She also explains how to resolve the errors encountered during installation.
  5. Follow the mediapipe guidelines for setup and installations for RaspberryPi:

Enable camera settings
Select the correct audio output

How it works

Now that we are done with the installations of all the packages, it’s time to code. I have uploaded my complete code in my GITHUB, link below.

How it works:

I have used the hand_landmarker task from mediapipe (mediapipe documentation), this is a bundle of two pre-trained models: hand_landmark_detection model and a palm detection model. It’s trained on thousands of ‘hand’ images to detect the 21 hand-knuckle coordinates, as shown in the image.

For our project, we are interested in the landmark co-ordinates of the four finger tips, we’ll store the positions of these tips in a Python list to use in our code. This is the Python list we have created in our code:

tip= [8,12,16,20]

tip[1] = 8, This is the hand landmark co-ordinate for the index_finger_tip tip[2]= 12, This is the hand landmark co-ordinate for the middle_finger_tip tip[3]= 16, This is the hand landmark co-ordinate for the ring_finger_tip tip[4]= 20, This is the hand landmark co-ordinate for the pinky_tip

Algorithm divides hands into 21 landmarks, each landmark has (x,y,z) co-ordinates. The upper left of the frame is (0,0) and the bottom right being (1,1). The z is the landmark depth.

For our project, we need to track if the tip of a finger in lower than the tips of the other three fingers, then play the musical note linked with this finger.

Code : Recognizing patterns for each finger

The code works by recognising the bend patterns for each finger.

For instance, for the index_finger (hand_landmark=8) : We consider that a person is intending to play a note with the index-finger when the landmark coordinates for tip of the index_finger is below the tip of the other fingers. [note the value of‘y’ coordinate increases downwards, bottom_right of the frame being maximum]. If this condition is true, then we play the drum beat linked with the index finger and we stop playing when this condition stops being true. Similar logic is used to play notes for the other fingers too.

Recognising the bend patterns for each finger.

Now, to play a note with pinky: Pinky is the shortest of the four fingers, this means that the tip of the pinky(landmark 20) is always shorter than the other three fingers. So, instead, we compare it with landmarks 19 and 18. If the tip is lower than the landmark 19 or 18 then we consider it being bend and that the user is intending to play a note with this finger and we play the drum beat linked to it.

for id in range(0,4):
#If tip of finger is lower than tips of the other three fingers, then play sound for this finger
#if (a[tip[id]][2:] < a[tip[id]-1][2:]) or (a[tip[id]][2:] < a[tip[id]-1][2:]):
if tip[id]==8:
if (a[tip[id]][2:] > a[11][2:]) and (a[tip[id]][2:] > a[15][2:]):
if tip[id]==12:
if (a[tip[id]][2:] > a[8][2:]) and (a[tip[id]][2:] > a[16][2:]) and (a[tip[id]][2:] > a[20][2:]):
if tip[id]==16:
if (a[tip[id]][2:] > a[7][2:]) and (a[tip[id]][2:] > a[11][2:]) and (a[tip[id]][2:] > a[20][2:]):
if tip[id]==20:
if (a[tip[id]][2:] < a[tip[id]-1][2:]) or (a[tip[id]][2:] < a[tip[id]-1][2:]):

Setting ML Model Parameters:

We can set parameters like number_of_hands to be tracked, detection_confidence and so on for our hand_landmarker ML task. Let’s take a look at these settings parameters in the code below:

with handsModule.Hands(static_image_mode=False, min_detection_confidence=0.7, min_tracking_confidence=0.7, max_num_hands=1) as hands:

For the purpose of this project, we have limited the number of hands being tracked to one. Parameter ‘max_num_hands’ is used to set this value in the line above.

static_image_mode=False, since we are using live_stream. Minimum_tracking_confidence — minimum confidence score for hand detection to be considered successful. Value between 0.0 and 1.0

Minimum_detection_confidence — Value between 0.0 and 1.0. It's the minimum confidence score for hand detection to be considered successful in palm_detection model.

Code Directory :

I have uploaded the code in my github, you can download it from my repository, link below, and run it after all the above installations. The code has been tested several times and works fine.

GitHub — Deepali-G16/AI-Percussion-instrument-with-OpenCV-and-ML

Control lights with your gestures:

I connected four LEDs to RaspberryPi’s GPIO pins, and synced each finger to a unique LED, such that bend of every finger turns the corresponding LED on. LED is turned off by bringing the finger back upright.

Scehmatics for LEDS connections to GPIO
Schematics for LEDs connected to RaspberryPi GPIO pins

Code for this is simple, it’s added under the same condition as the music. We play musical note for every finger as long as the condition stands true and stop playing when the condition is false(For instance, when the Y co-ordinate of the tip of the pinky is higher than its knuckles.)

 if x[0] == 0: #pointer is down

if x[0] == 1: #pointer is up
if x[1] == 0:
if x[1] == 1:

if x[2] == 0: #ring_finger is down

if x[2] == 1: #ring_finger is down

if x[3] == 0: #pinky is down

if x[3] == 1: #pinky is up

Now, You Can Play the Drums With These Hand Gestures:

1. Place your hand in front of the camera(near the black line) until it’s seen clearly in the frame.

2. Wiggle your fingers to play the drums and control the light show! Each finger plays a unique beat.

Recognising the bend patterns for each finger.

3. To STOP the Drums- Place Your Hand Upright in Front of the Camera

Place hand upright to STOP playing the drums.

Variations/Next steps:

You can add your own variations to this musical project.

I have chosen tabla drums, but you could use piano or any other sound .WAV files, or you could code thumb for volume control or to control a motor and so on. Or take a step further, and get everyone around you moving by combining it with the posture detect model from mediapipe.

Pre-trained models at mediapipe are a great place to start your ML journey, you can start by including a pre trained ML model in your Python code and as you get familiar with it and understand the constraints, you can take the next step to customize it with mediapipe model maker.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓