Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Creating a Touchless Interface with Tensorflow.js
Latest

Creating a Touchless Interface with Tensorflow.js

Last Updated on November 3, 2021 by Editorial Team

Author(s): Hammaad Memon

Deep Learning

Touchscreens were revolutionary. However, with the emergence of wireless technology and highly contagious diseases spreading on surfaces, there is a need for a refined way of interacting with applications. Introducing Touchless Interface.

A touchless interface
Source: By Ali Pazani from pexels.com

Introduction

Touchless Interface runs quite well on a large spectrum of devices, opening the doors for real-world implementation on many platformsβ€Šβ€”β€Šfrom touchless kiosks to VR game integration.

This write-up will walk through the code for a web-based touchless interface where users from any platform can interact with virtual blocks with the movement of theirΒ hands.

Useful Links:

Setup

Before data can start being pushed through a deep learning model, there are a few tasks that need to be completed first.

  1. Loading Tensorflow.js
  2. Loading the HandPoseΒ Model
  3. Setting up theΒ WebCam
  4. Creating randomly-generated blocks

Tensorflow.js is simply loaded via a cdn in the index.html file. The script tag for Tensorflow.js is placed before the local JavaScript tags and after all the UI elements to allow the UI to load first Tensorflow.js second, and the local JavaScript last.

The HandPose Model is also loaded via cdn right after Tensorflow.js for the same reasons as above. However, loading the HandPose model also requires executing the following asynchronous command: handpose.load()Β . In the code for Touchless Interface, steps 1–3 are wrapped in calls to attempt() functions. Here’s the attempt() function for loading the HandPoseΒ Model:

// Load the MediaPipe handpose model.
const model = await attempt(
async () => await handpose.load(),
() => statusDisplay.textContent = "Setting up Webcam",
() => fail("Failed to Load Model"),
);

The attempt() function takes 3 functions are parameters. The first function being passed is the function to execute. The second function is to be executed if the first is completed without any errors. And the last function is to be executed if the first threw an error. In this case, if await handpose.load() runs without any errors, the on-screen status will be updated to the next step otherwise a fail callback will be executed to abortΒ setup.

The next task is to set up the webcam which is extracted into the asynchronous function setupWebcam and familiarly wrapped in a attempt()call. The setupWebcam the function accesses the user’s webcam and sets it as the source of the on-screen video element. It also sets the global variables for the webcam’s size which will be essential for deciphering the hand’s offset from the video feed. The function also sets the plays-inline property true for the video element to ensure safari compatibility.

The last setup task is to randomly generate the colored blocks for the user to interact with. This is done in the createObjects function by using the document.createElement function to create the HTML elements which are then randomly applied to styleΒ presets.

Tracking theΒ Hand

Having completed the initial setup, the rest of the code will be executed within the main loop of the application. The first step in tracking the user’s hand is calling the asynchronous function model.estimateHands(videoRef) which uses the HandPose model to predict hands from the video element which is streaming the user’s webcam feed. The estimateHands function returns an array of predictions, of which the first is most relevant. The predictions object has a lot of important information such as the coordinates (x, y, z) for the palm base and key points on everyΒ finger.

The first task is to move the on-screen target element based on the position of the hand in the webcam. The getHandCoords function takes in the prediction.annotations as input and uses that to retrieve the x and y coordinates of the palm base. It then sends back the result of the following expression:

return [
( palmX / webcamWidth ) * 100,
( palmY / webcamHeight ) * 100
];

The position of the hand in the viewport is converted to a percentage offset which is then used to set right and top style attributes of the on-screen target to match the hand’s position.

Interacting with theΒ UI

The final and perhaps most challenging step is allowing the user to pick up, move, and drop the randomly generated blocks. Here’s a brief explanation of theΒ concept:

When the user’s fingers are outstretched and their palm flat towards the camera, the on-screen target should pass over any blocks without changing their position and drop any heldΒ block.

In contrast, when the user’s fingers are curled, or their hand a fist, any held block should continue to be be moved. If there are no blocks being held, passing over a block with a closed hand should pick it up and drag it along with the on-screen cursor.

Introducing the isHandOpen library

The implementation of the pseudo-code above depends on accurately determining whether or not the user’s hand is open or not. To efficiently extract this component into a re-usable component, I’ve compiled a pure JavaScript library that uses the predictions.annotations object to return a true or false value for whether or not the user’s hand is open. Here’s a link to the repository. It is important to note that the Touchless Interface repository ships with isHandOpen included. If not using Touchless Interface’s repository, download the minified isHandOpen JavaScript file and reference it <script src="isHandOpen.min.js"></script> right above any local JavaScript tag. This will define the isHandOpen function which takes two parameters: the predictions.annotations object, and the optional HAND_OPEN_BUFFER. The second parameter controls how many times in a row the hand should be detected as open before the isHandOpen function returns true. By default, HAND_OPEN_BUFFER is null meaning the function returns exactly what it predicts, every time. The recurring bug I found with this approach is the application dropping blocks when the user’s hand is closed. The HAND_OPEN_BUFFER effectively eliminates the problem of false positives at the cost of dropping items HAND_OPEN_BUFFER calls later to isHandOpen-which in this case is a handful of milliseconds.

Dragging Items

A simple call to isHandOpen with the predictions.annotations object and a HAND_OPEN_BUFFER of 2 returns a Boolean for whether or not the user’s hand is open. The next step is to allow the user to move the on-screen blocks. In order to pick up a block, the user’s hand must be closed and the cursor or target must be close to the block. If the hand is closed and no item is currently held, the canGrabItem function is executed which compares the target’s top and right style values to every item on the screen and finds to closest object that can be picked up. If the item is close enough, it then sets it as the value for the global variable for the item currently held. The value is then read on subsequent iterations of the main loop which displaces it along with the target cursorβ€Šβ€”β€Šmoving it along with the user’sΒ hand.

Summary

Touchless Interface utilizes the HandPose model from Tensorflow.js to locate a user’s hands from a webcam and projects its position onto an on-screen cursor. It then integrates the isHandOpen library to decipher whether or not the user’s hand is open. This information is then used to allow the user to grab, move, and drop on-screen widgets on their fingers andΒ hand.

Implications

Touchless Interface in the browser is representative of the wide array of applications for this technology. From touchless solutions for public areas to software immersion at a new level, touchless interfaces have the capability to revolutionize how humans interact with computers.


Creating a Touchless Interface with Tensorflow.js was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓