Creating a Touchless Interface with Tensorflow.js
Last Updated on November 3, 2021 by Editorial Team
Author(s): Hammaad Memon
Deep Learning
Touchscreens were revolutionary. However, with the emergence of wireless technology and highly contagious diseases spreading on surfaces, there is a need for a refined way of interacting with applications. Introducing Touchless Interface.
Introduction
Touchless Interface runs quite well on a large spectrum of devices, opening the doors for real-world implementation on many platformsβββfrom touchless kiosks to VR game integration.
This write-up will walk through the code for a web-based touchless interface where users from any platform can interact with virtual blocks with the movement of theirΒ hands.
Useful Links:
- Touchless Interface (website)
- Touchless Interface GitHub Repository
- isHandOpen GitHub Repository (will be discussed later)
Setup
Before data can start being pushed through a deep learning model, there are a few tasks that need to be completed first.
- Loading Tensorflow.js
- Loading the HandPoseΒ Model
- Setting up theΒ WebCam
- Creating randomly-generated blocks
Tensorflow.js is simply loaded via a cdn in the index.html file. The script tag for Tensorflow.js is placed before the local JavaScript tags and after all the UI elements to allow the UI to load first Tensorflow.js second, and the local JavaScript last.
The HandPose Model is also loaded via cdn right after Tensorflow.js for the same reasons as above. However, loading the HandPose model also requires executing the following asynchronous command: handpose.load()Β . In the code for Touchless Interface, steps 1β3 are wrapped in calls to attempt() functions. Hereβs the attempt() function for loading the HandPoseΒ Model:
// Load the MediaPipe handpose model.
const model = await attempt(
async () => await handpose.load(),
() => statusDisplay.textContent = "Setting up Webcam",
() => fail("Failed to Load Model"),
);
The attempt() function takes 3 functions are parameters. The first function being passed is the function to execute. The second function is to be executed if the first is completed without any errors. And the last function is to be executed if the first threw an error. In this case, if await handpose.load() runs without any errors, the on-screen status will be updated to the next step otherwise a fail callback will be executed to abortΒ setup.
The next task is to set up the webcam which is extracted into the asynchronous function setupWebcam and familiarly wrapped in a attempt()call. The setupWebcam the function accesses the userβs webcam and sets it as the source of the on-screen video element. It also sets the global variables for the webcamβs size which will be essential for deciphering the handβs offset from the video feed. The function also sets the plays-inline property true for the video element to ensure safari compatibility.
The last setup task is to randomly generate the colored blocks for the user to interact with. This is done in the createObjects function by using the document.createElement function to create the HTML elements which are then randomly applied to styleΒ presets.
Tracking theΒ Hand
Having completed the initial setup, the rest of the code will be executed within the main loop of the application. The first step in tracking the userβs hand is calling the asynchronous function model.estimateHands(videoRef) which uses the HandPose model to predict hands from the video element which is streaming the userβs webcam feed. The estimateHands function returns an array of predictions, of which the first is most relevant. The predictions object has a lot of important information such as the coordinates (x, y, z) for the palm base and key points on everyΒ finger.
The first task is to move the on-screen target element based on the position of the hand in the webcam. The getHandCoords function takes in the prediction.annotations as input and uses that to retrieve the x and y coordinates of the palm base. It then sends back the result of the following expression:
return [
( palmX / webcamWidth ) * 100,
( palmY / webcamHeight ) * 100
];
The position of the hand in the viewport is converted to a percentage offset which is then used to set right and top style attributes of the on-screen target to match the handβs position.
Interacting with theΒ UI
The final and perhaps most challenging step is allowing the user to pick up, move, and drop the randomly generated blocks. Hereβs a brief explanation of theΒ concept:
When the userβs fingers are outstretched and their palm flat towards the camera, the on-screen target should pass over any blocks without changing their position and drop any heldΒ block.
In contrast, when the userβs fingers are curled, or their hand a fist, any held block should continue to be be moved. If there are no blocks being held, passing over a block with a closed hand should pick it up and drag it along with the on-screen cursor.
Introducing the isHandOpen library
The implementation of the pseudo-code above depends on accurately determining whether or not the userβs hand is open or not. To efficiently extract this component into a re-usable component, Iβve compiled a pure JavaScript library that uses the predictions.annotations object to return a true or false value for whether or not the userβs hand is open. Hereβs a link to the repository. It is important to note that the Touchless Interface repository ships with isHandOpen included. If not using Touchless Interfaceβs repository, download the minified isHandOpen JavaScript file and reference it <script src="isHandOpen.min.js"></script> right above any local JavaScript tag. This will define the isHandOpen function which takes two parameters: the predictions.annotations object, and the optional HAND_OPEN_BUFFER. The second parameter controls how many times in a row the hand should be detected as open before the isHandOpen function returns true. By default, HAND_OPEN_BUFFER is null meaning the function returns exactly what it predicts, every time. The recurring bug I found with this approach is the application dropping blocks when the userβs hand is closed. The HAND_OPEN_BUFFER effectively eliminates the problem of false positives at the cost of dropping items HAND_OPEN_BUFFER calls later to isHandOpen-which in this case is a handful of milliseconds.
Dragging Items
A simple call to isHandOpen with the predictions.annotations object and a HAND_OPEN_BUFFER of 2 returns a Boolean for whether or not the userβs hand is open. The next step is to allow the user to move the on-screen blocks. In order to pick up a block, the userβs hand must be closed and the cursor or target must be close to the block. If the hand is closed and no item is currently held, the canGrabItem function is executed which compares the targetβs top and right style values to every item on the screen and finds to closest object that can be picked up. If the item is close enough, it then sets it as the value for the global variable for the item currently held. The value is then read on subsequent iterations of the main loop which displaces it along with the target cursorβββmoving it along with the userβsΒ hand.
Summary
Touchless Interface utilizes the HandPose model from Tensorflow.js to locate a userβs hands from a webcam and projects its position onto an on-screen cursor. It then integrates the isHandOpen library to decipher whether or not the userβs hand is open. This information is then used to allow the user to grab, move, and drop on-screen widgets on their fingers andΒ hand.
Implications
Touchless Interface in the browser is representative of the wide array of applications for this technology. From touchless solutions for public areas to software immersion at a new level, touchless interfaces have the capability to revolutionize how humans interact with computers.
Creating a Touchless Interface with Tensorflow.js was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI