Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


Revolutionary Computer Vision
Artificial Intelligence   Computer Vision   Latest   Machine Learning

Revolutionary Computer Vision

Last Updated on March 13, 2024 by Editorial Team

Author(s): Enos Jeba

Originally published on Towards AI.

The amazing projects inspired me to look at computer vision as the most comfortable field I would ever work in.

I like getting my head into different fields and trying to connect the dots and understand how each field is invisibly holding another subject.

Photos aren’t photos; they are just combinations of numerical values laid out in a cube. Yes, and this thought pushed me to get into computer vision.

When I got my feet in deep learning and understanding image processing, I found there were some amazing computer vision projects already Implemented in the world. I will explain it. Let’s start with

Face ID

Source —

The most popular computer vision project to reach everyone’s hands is the iPhone’s face recognition. With the introduction of the iPhone X, Apple added cameras that would unlock by looking at you.

FACE ID feature helped people unlock the phone with a glance at it. You registered your face once and then you were good to go.

Just like doing things the Apple way, Facial recognition was also given serious attention to avoid false unlocks with the help of photos or other pictorial representations.

Many journalists took upon multiple creative experiments to fool the recognition algorithm, but it wasn’t that easy to fool the algorithm. It even worked in the dark.

To implement FACE ID, Apple implemented a camera system called the True Depth Camera System.

The True Depth Camera System consists of

  1. Infrared camera
  2. Flood illuminator
  3. Front Camera
  4. Dot projector
Source — Apple

Here’s how it works.

  1. The flood illuminator is responsible for detecting your face (even in the dark)
  2. The infrared camera takes an IR Image of your face
  3. The Dot projector joins in to project around 30,000 dots on your face
  4. The IR image and the Dot project is then pushed into a neural network to create a mathematical model of your face
  5. Now this mathematical model is matched with the already stored face when you set it up to unlock your phone

The highlighting feature is that it would adapt to facial changes and would still continue to recognize you after you have grown a beard and put on glasses.

Computer Vision Libraries in Python

Python libraries to work with Images and Videos

Things we can pick up from this project

User Experience

Also known as the Apple Way, Face ID was a simple yet remarkable approach to interacting and harnessing the good in technology. It was implemented in a way which was user-friendly and seamlessly attached to our lifestyle.

When we make our algorithms usable, the audience ready to use it would become wider. The wider the audience, the more useful your product is.

Of course, you also have the added advantage of getting more data, But why more data? Let’s see it next

Retraining Model

The Retraining Model is the most overlooked part of Machine Learning. Once a model is trained, people (or stakeholders) expect it to run forever without failing.

If it is said so that the World Changes, so does a real-world entity, like our data. Data like images are prone to change and the probability of forming the same shape is prior is pretty low or non-existent.

If the FACE ID were to just operate on the data registered on the setup, I would easily fail in a while after our biological facial feature changes.

Thus, it should be retrained again and again to work accurately, like a knife that needs to be sharpened to cut smoothly.

But even when retraining, we don’t keep rotating our heads with the new FACE ID setup, right? Then how would it work?

Every time you unlock with your face, the image captured will also be sent to the background training to keep the model up to date. As an entity such as facial hair grows slowly, The FACE ID trained with the latest data of your face, as you will be using the face unlock every now and then, is easily able to predict the possibility of your face.

Out of the box

Apple did not limit computer vision to just cameras. They explored all the possibilities for imaging sensors. They created a whole new setup for the solution to properly working keeping in mind the small space of the phone.

Apple went out of the way to ensure the features worked properly, taking security seriously and understanding that even a small flaw can collapse the whole iPhone market in the world.

One exposed flaw meant that their billion devices were also equipped with that same vulnerability, which can be exploited in unthinkable ways, posing a huge security threat. Apple, as always, managed to put in the effort and be serious about their features added and throw away competition.

There are thousands of electronics components available in the world, and attaching one to the other would be a fun Lego version of technology. If we consider our use case and play around with components to make it work effectively, we sure would be making another successful product for history to look back.

See the working of FACE ID in real life with infrared cameras.

Xbox Kinetic

“The device that lets you be the controller”

is the perfect expression of Xbox Kinect. Microsoft added “The Controller you are” on one of the game CD boxes, which highlights the feature of game characters mimicking your actual body movements.

Launched with the Xbox 360, Kinect is a powerful implementation of Computer Vision and body tracking.

With Xbox Kinect you could move across menu with your hand as a pointer. You could scroll through the o.s. and click on options. If you did not want to use your hand, you can switch to voice control.

Just as the enemy would come rushing at you, you would now swing with your hands to kill him than pressing a button like you normally did.

With multiplayer games you could ask your friend to play right beside you and you guys could play together. You can kick the football in air and your friend trying to save the goal. You could scan your skateboard into the game and play virtually.

Source —

In games like Dance Central Spotlight, you can dance, and the character in the game will copy your moves.

Raj and Howard using Kinect while Leonard staring at the scores

Kinect was also being used in schools to interactively teach young minds.

Source — crowbcat

Kinect was powered by hardware and software.

The multi-cam setup was doing two things

  1. Generating 3D moving image of the objects in its field of view
  2. Recognize (moving) human beings among those objects.

The camera uses a technique called time-of-flight to distinguish objects from the background. It emits near-infrared light that is invisible to the human eye and measures how long it takes to bounce back after hitting the objects. This is similar to how sonar works: The longer the light takes to return, the farther away the object is.

Source —

Kinect Software

The Kinect had a separate processor onboard, which had algorithms to process the data passed in from the camera and render the 3d image.

It could also recognize people and identify human body parts and their joint with movement. The software would place a estimated skeleton on the detected human to let the software know the position of the human for the virtual character to do the same.

It also identifies a person from another, so when you face it, it knows who you are.

Top Computer Vision Algorithms

From Pixels to Insights

Things to admire in Kinect

The software and hardware integration are amazing.

The movement replicated in almost real-time by another 3d model on the screen is a complex thing to achieve.

The cameras identifying each body part were also significant for the virtual models to make the same moves.

You can watch Dr Andrew Fitzgibbon, who was a key person in developing the technology for Kinect, gave a seminar on building Computer Vision Systems at the University of Surrey, Guilford

Face tracking cameras

Keeping your face in center of the video frame makes a huge difference when you are video calling. The person on the other side will be more comfortable when they can see your natural pose than your half face or the part of your home where you are right at the edge of the frame.

The main objective was to keep the face in the center of the video frame. Owing to webcam’s which almost will be stationary, you always had to be sitting right in front of it.

We could detect your face from the webcam but if you move, we cannot move the webcam or change its direction and that’s where Face Tracking camera paved their way in.

The cameras were attached to a motorized base, which could rotate on itself. The camera was equipped with a face tracking model to know the location of the person in the frame would estimate the position and send signals to the motorized base on the required movement to recenter the face on screen.

Source —

Some cameras also used zoom features to zoom out if the person is too close and zoom in if the person is in a distance.

Later we also got to see camera’s mounted on small gimbal system controlled by live feed from the cameras. This mean that you could just move around than sitting just on a place for video conferencing.

Dji Pocket 2

Content creators soon were also presented with cameras which would keep their faces in the center so that their Vlogs would come out better.

Looking at the camera and taking now seemed more easier as you have an automated companion to track your face and present you well.

Center Stage

Surprisingly, Apple also took a shot at it with their existing hardware's. Apple with the help of its ultrawide angle camera and heavy software procession manages to keep people in the center of the frame and they call it Center Stage.

It cannot just rotate and do such stuff like the gimbals or motorized cameras as here just one image is digitally skewed to bring the desired natural effect.

How to See Like a Machine

A Guide to Computer Vision Tools

Things we can grab from this projects.

  • Not just combining electrical components, we can also make the components bring the best of our algorithm by sending them specific instructions to perform. I mean Why Not?

Windows Hello

Windows Hello was made to allow laptop users to gently sign in without even touching the device. People's natural interaction with a laptop would open the lid and face the screen. Microsoft thought to unlock the screen at the first glance at the screen itself, so it came up with Windows Hello.

When Windows Hello was made into the market, the demand for notebook laptops increased, and people were getting comfortable holding lightweight devices that could also easily slide into their bags.

Windows hello worked based on Infrared cameras. It also reads facial movements to ensure liveness of a person. It also extends to recognize body heats to make sure you are a living person, so photographs are not going to work.

It was also claimed that during the testing it also differentiated between Identical twins.

It also is claimed to measuring the distance between your eyes and forehead etc. It’s not limited to a single picture.

Computer Vision

Things I wish I knew before I started.

Things we can pick up from Windows Hello

  1. User Experience improves the value of the algorithm.
  2. It does not always have to be just one image; you could also use a stack of images to improve the accuracy score.

Autonomous Vehicle

Tesla. Yes, a popular electric vehicle with the capability to drive around on its own.

Equipped with multiple cameras, Electric vehicles now posses to ability to drive with decision based on multiple Realtime video feed bundled with object detection, lane judgement, pedestrian detection etc.

With multiple cameras, the algorithm becomes aware of its surroundings. It now knows.

  • How far the car behind is standing?
  • Is there anyone walking beside me?
  • What’s in front of the car?
  • Is anyone walking in the front?
  • Where’s the white line on the road?

with so much more essential information to make a decision in the middle of the road.

And with additional sensors such as GPS we have our location on the map. The GPS also helps to prevent theft.

Autonomous Levels

I got this piece from the internet

  1. Driver Assistance — Operated by a human driver, but with assistance from an automated system for steering or acceleration (but not both)
  2. Partial Automation — automated assistance is offered for both steering and acceleration, but the driver is still responsible for safety-critical actions.
  3. Conditional Automation — sensors monitor the surrounding environment, and the additional activities are automated such as braking. The driver must be ready to intervene should the system fail.
  4. High Automation — the vehicle can operate fully autonomously but the mode can only be activated under specific conditions.
  5. Full Automation — the driver specifies the destination, and the car does the rest on its own. At this level, there is no need for a steering wheel or pedals.

And with computer vision, we have pretty much reached Full Automation.

Assuming you have been inspired to level up your computer vision projects with the current industrial methods, let’s re-revolutionize again.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓