Creating a Touchless Interface with Tensorflow.js
Last Updated on November 3, 2021 by Editorial Team
Author(s): Hammaad Memon
Touchscreens were revolutionary. However, with the emergence of wireless technology and highly contagious diseases spreading on surfaces, there is a need for a refined way of interacting with applications. Introducing Touchless Interface.
Touchless Interface runs quite well on a large spectrum of devices, opening the doors for real-world implementation on many platforms — from touchless kiosks to VR game integration.
This write-up will walk through the code for a web-based touchless interface where users from any platform can interact with virtual blocks with the movement of their hands.
- Touchless Interface (website)
- Touchless Interface GitHub Repository
- isHandOpen GitHub Repository (will be discussed later)
Before data can start being pushed through a deep learning model, there are a few tasks that need to be completed first.
- Loading Tensorflow.js
- Loading the HandPose Model
- Setting up the WebCam
- Creating randomly-generated blocks
The HandPose Model is also loaded via cdn right after Tensorflow.js for the same reasons as above. However, loading the HandPose model also requires executing the following asynchronous command: handpose.load() . In the code for Touchless Interface, steps 1–3 are wrapped in calls to attempt() functions. Here’s the attempt() function for loading the HandPose Model:
// Load the MediaPipe handpose model.
const model = await attempt(
async () => await handpose.load(),
() => statusDisplay.textContent = "Setting up Webcam",
() => fail("Failed to Load Model"),
The attempt() function takes 3 functions are parameters. The first function being passed is the function to execute. The second function is to be executed if the first is completed without any errors. And the last function is to be executed if the first threw an error. In this case, if await handpose.load() runs without any errors, the on-screen status will be updated to the next step otherwise a fail callback will be executed to abort setup.
The next task is to set up the webcam which is extracted into the asynchronous function setupWebcam and familiarly wrapped in a attempt()call. The setupWebcam the function accesses the user’s webcam and sets it as the source of the on-screen video element. It also sets the global variables for the webcam’s size which will be essential for deciphering the hand’s offset from the video feed. The function also sets the plays-inline property true for the video element to ensure safari compatibility.
The last setup task is to randomly generate the colored blocks for the user to interact with. This is done in the createObjects function by using the document.createElement function to create the HTML elements which are then randomly applied to style presets.
Tracking the Hand
Having completed the initial setup, the rest of the code will be executed within the main loop of the application. The first step in tracking the user’s hand is calling the asynchronous function model.estimateHands(videoRef) which uses the HandPose model to predict hands from the video element which is streaming the user’s webcam feed. The estimateHands function returns an array of predictions, of which the first is most relevant. The predictions object has a lot of important information such as the coordinates (x, y, z) for the palm base and key points on every finger.
The first task is to move the on-screen target element based on the position of the hand in the webcam. The getHandCoords function takes in the prediction.annotations as input and uses that to retrieve the x and y coordinates of the palm base. It then sends back the result of the following expression:
( palmX / webcamWidth ) * 100,
( palmY / webcamHeight ) * 100
The position of the hand in the viewport is converted to a percentage offset which is then used to set right and top style attributes of the on-screen target to match the hand’s position.
Interacting with the UI
The final and perhaps most challenging step is allowing the user to pick up, move, and drop the randomly generated blocks. Here’s a brief explanation of the concept:
When the user’s fingers are outstretched and their palm flat towards the camera, the on-screen target should pass over any blocks without changing their position and drop any held block.
In contrast, when the user’s fingers are curled, or their hand a fist, any held block should continue to be be moved. If there are no blocks being held, passing over a block with a closed hand should pick it up and drag it along with the on-screen cursor.
Introducing the isHandOpen library
A simple call to isHandOpen with the predictions.annotations object and a HAND_OPEN_BUFFER of 2 returns a Boolean for whether or not the user’s hand is open. The next step is to allow the user to move the on-screen blocks. In order to pick up a block, the user’s hand must be closed and the cursor or target must be close to the block. If the hand is closed and no item is currently held, the canGrabItem function is executed which compares the target’s top and right style values to every item on the screen and finds to closest object that can be picked up. If the item is close enough, it then sets it as the value for the global variable for the item currently held. The value is then read on subsequent iterations of the main loop which displaces it along with the target cursor — moving it along with the user’s hand.
Touchless Interface utilizes the HandPose model from Tensorflow.js to locate a user’s hands from a webcam and projects its position onto an on-screen cursor. It then integrates the isHandOpen library to decipher whether or not the user’s hand is open. This information is then used to allow the user to grab, move, and drop on-screen widgets on their fingers and hand.
Touchless Interface in the browser is representative of the wide array of applications for this technology. From touchless solutions for public areas to software immersion at a new level, touchless interfaces have the capability to revolutionize how humans interact with computers.
Published via Towards AI