Accelerate your AI journey. Join our AI Community!



Creating a Touchless Interface with Tensorflow.js

Author(s): Hammaad Memon

Deep Learning

Touchscreens were revolutionary. However, with the emergence of wireless technology and highly contagious diseases spreading on surfaces, there is a need for a refined way of interacting with applications. Introducing Touchless Interface.

A touchless interface
Source: By Ali Pazani from


Touchless Interface runs quite well on a large spectrum of devices, opening the doors for real-world implementation on many platforms — from touchless kiosks to VR game integration.

This write-up will walk through the code for a web-based touchless interface where users from any platform can interact with virtual blocks with the movement of their hands.

Useful Links:


Before data can start being pushed through a deep learning model, there are a few tasks that need to be completed first.

  1. Loading Tensorflow.js
  2. Loading the HandPose Model
  3. Setting up the WebCam
  4. Creating randomly-generated blocks

Tensorflow.js is simply loaded via a cdn in the index.html file. The script tag for Tensorflow.js is placed before the local JavaScript tags and after all the UI elements to allow the UI to load first Tensorflow.js second, and the local JavaScript last.

The HandPose Model is also loaded via cdn right after Tensorflow.js for the same reasons as above. However, loading the HandPose model also requires executing the following asynchronous command: handpose.load() . In the code for Touchless Interface, steps 1–3 are wrapped in calls to attempt() functions. Here’s the attempt() function for loading the HandPose Model:

// Load the MediaPipe handpose model.
const model = await attempt(
async () => await handpose.load(),
() => statusDisplay.textContent = "Setting up Webcam",
() => fail("Failed to Load Model"),

The attempt() function takes 3 functions are parameters. The first function being passed is the function to execute. The second function is to be executed if the first is completed without any errors. And the last function is to be executed if the first threw an error. In this case, if await handpose.load() runs without any errors, the on-screen status will be updated to the next step otherwise a fail callback will be executed to abort setup.

The next task is to set up the webcam which is extracted into the asynchronous function setupWebcam and familiarly wrapped in a attempt()call. The setupWebcam the function accesses the user’s webcam and sets it as the source of the on-screen video element. It also sets the global variables for the webcam’s size which will be essential for deciphering the hand’s offset from the video feed. The function also sets the plays-inline property true for the video element to ensure safari compatibility.

The last setup task is to randomly generate the colored blocks for the user to interact with. This is done in the createObjects function by using the document.createElement function to create the HTML elements which are then randomly applied to style presets.

Tracking the Hand

Having completed the initial setup, the rest of the code will be executed within the main loop of the application. The first step in tracking the user’s hand is calling the asynchronous function model.estimateHands(videoRef) which uses the HandPose model to predict hands from the video element which is streaming the user’s webcam feed. The estimateHands function returns an array of predictions, of which the first is most relevant. The predictions object has a lot of important information such as the coordinates (x, y, z) for the palm base and key points on every finger.

The first task is to move the on-screen target element based on the position of the hand in the webcam. The getHandCoords function takes in the prediction.annotations as input and uses that to retrieve the x and y coordinates of the palm base. It then sends back the result of the following expression:

return [
( palmX / webcamWidth ) * 100,
( palmY / webcamHeight ) * 100

The position of the hand in the viewport is converted to a percentage offset which is then used to set right and top style attributes of the on-screen target to match the hand’s position.

Interacting with the UI

The final and perhaps most challenging step is allowing the user to pick up, move, and drop the randomly generated blocks. Here’s a brief explanation of the concept:

When the user’s fingers are outstretched and their palm flat towards the camera, the on-screen target should pass over any blocks without changing their position and drop any held block.

In contrast, when the user’s fingers are curled, or their hand a fist, any held block should continue to be be moved. If there are no blocks being held, passing over a block with a closed hand should pick it up and drag it along with the on-screen cursor.

Introducing the isHandOpen library

The implementation of the pseudo-code above depends on accurately determining whether or not the user’s hand is open or not. To efficiently extract this component into a re-usable component, I’ve compiled a pure JavaScript library that uses the predictions.annotations object to return a true or false value for whether or not the user’s hand is open. Here’s a link to the repository. It is important to note that the Touchless Interface repository ships with isHandOpen included. If not using Touchless Interface’s repository, download the minified isHandOpen JavaScript file and reference it <script src="isHandOpen.min.js"></script> right above any local JavaScript tag. This will define the isHandOpen function which takes two parameters: the predictions.annotations object, and the optional HAND_OPEN_BUFFER. The second parameter controls how many times in a row the hand should be detected as open before the isHandOpen function returns true. By default, HAND_OPEN_BUFFER is null meaning the function returns exactly what it predicts, every time. The recurring bug I found with this approach is the application dropping blocks when the user’s hand is closed. The HAND_OPEN_BUFFER effectively eliminates the problem of false positives at the cost of dropping items HAND_OPEN_BUFFER calls later to isHandOpen-which in this case is a handful of milliseconds.

Dragging Items

A simple call to isHandOpen with the predictions.annotations object and a HAND_OPEN_BUFFER of 2 returns a Boolean for whether or not the user’s hand is open. The next step is to allow the user to move the on-screen blocks. In order to pick up a block, the user’s hand must be closed and the cursor or target must be close to the block. If the hand is closed and no item is currently held, the canGrabItem function is executed which compares the target’s top and right style values to every item on the screen and finds to closest object that can be picked up. If the item is close enough, it then sets it as the value for the global variable for the item currently held. The value is then read on subsequent iterations of the main loop which displaces it along with the target cursor — moving it along with the user’s hand.


Touchless Interface utilizes the HandPose model from Tensorflow.js to locate a user’s hands from a webcam and projects its position onto an on-screen cursor. It then integrates the isHandOpen library to decipher whether or not the user’s hand is open. This information is then used to allow the user to grab, move, and drop on-screen widgets on their fingers and hand.


Touchless Interface in the browser is representative of the wide array of applications for this technology. From touchless solutions for public areas to software immersion at a new level, touchless interfaces have the capability to revolutionize how humans interact with computers.

Creating a Touchless Interface with Tensorflow.js was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓