Revolutionary Computer Vision
Last Updated on March 13, 2024 by Editorial Team
Author(s): Enos Jeba
Originally published on Towards AI.
The amazing projects inspired me to look at computer vision as the most comfortable field I would ever work in.
I like getting my head into different fields and trying to connect the dots and understand how each field is invisibly holding another subject.
Photos arenβt photos; they are just combinations of numerical values laid out in a cube. Yes, and this thought pushed me to get into computer vision.
When I got my feet in deep learning and understanding image processing, I found there were some amazing computer vision projects already Implemented in the world. I will explain it. Letβs start with
Face ID
The most popular computer vision project to reach everyoneβs hands is the iPhoneβs face recognition. With the introduction of the iPhone X, Apple added cameras that would unlock by looking at you.
FACE ID feature helped people unlock the phone with a glance at it. You registered your face once and then you were good to go.
Just like doing things the Apple way, Facial recognition was also given serious attention to avoid false unlocks with the help of photos or other pictorial representations.
Many journalists took upon multiple creative experiments to fool the recognition algorithm, but it wasnβt that easy to fool the algorithm. It even worked in the dark.
To implement FACE ID, Apple implemented a camera system called the True Depth Camera System.
The True Depth Camera System consists of
- Infrared camera
- Flood illuminator
- Front Camera
- Dot projector
Hereβs how it works.
- The flood illuminator is responsible for detecting your face (even in the dark)
- The infrared camera takes an IR Image of your face
- The Dot projector joins in to project around 30,000 dots on your face
- The IR image and the Dot project is then pushed into a neural network to create a mathematical model of your face
- Now this mathematical model is matched with the already stored face when you set it up to unlock your phone
The highlighting feature is that it would adapt to facial changes and would still continue to recognize you after you have grown a beard and put on glasses.
Computer Vision Libraries in Python
Python libraries to work with Images and Videos
enosjeba.medium.com
Things we can pick up from this project
User Experience
Also known as the Apple Way, Face ID was a simple yet remarkable approach to interacting and harnessing the good in technology. It was implemented in a way which was user-friendly and seamlessly attached to our lifestyle.
When we make our algorithms usable, the audience ready to use it would become wider. The wider the audience, the more useful your product is.
Of course, you also have the added advantage of getting more data, But why more data? Letβs see it next
Retraining Model
The Retraining Model is the most overlooked part of Machine Learning. Once a model is trained, people (or stakeholders) expect it to run forever without failing.
If it is said so that the World Changes, so does a real-world entity, like our data. Data like images are prone to change and the probability of forming the same shape is prior is pretty low or non-existent.
If the FACE ID were to just operate on the data registered on the setup, I would easily fail in a while after our biological facial feature changes.
Thus, it should be retrained again and again to work accurately, like a knife that needs to be sharpened to cut smoothly.
But even when retraining, we donβt keep rotating our heads with the new FACE ID setup, right? Then how would it work?
Every time you unlock with your face, the image captured will also be sent to the background training to keep the model up to date. As an entity such as facial hair grows slowly, The FACE ID trained with the latest data of your face, as you will be using the face unlock every now and then, is easily able to predict the possibility of your face.
Out of the box
Apple did not limit computer vision to just cameras. They explored all the possibilities for imaging sensors. They created a whole new setup for the solution to properly working keeping in mind the small space of the phone.
Apple went out of the way to ensure the features worked properly, taking security seriously and understanding that even a small flaw can collapse the whole iPhone market in the world.
One exposed flaw meant that their billion devices were also equipped with that same vulnerability, which can be exploited in unthinkable ways, posing a huge security threat. Apple, as always, managed to put in the effort and be serious about their features added and throw away competition.
There are thousands of electronics components available in the world, and attaching one to the other would be a fun Lego version of technology. If we consider our use case and play around with components to make it work effectively, we sure would be making another successful product for history to look back.
See the working of FACE ID in real life with infrared cameras.
Xbox Kinetic
βThe device that lets you be the controllerβ
is the perfect expression of Xbox Kinect. Microsoft added βThe Controller you areβ on one of the game CD boxes, which highlights the feature of game characters mimicking your actual body movements.
Launched with the Xbox 360, Kinect is a powerful implementation of Computer Vision and body tracking.
With Xbox Kinect you could move across menu with your hand as a pointer. You could scroll through the o.s. and click on options. If you did not want to use your hand, you can switch to voice control.
Just as the enemy would come rushing at you, you would now swing with your hands to kill him than pressing a button like you normally did.
With multiplayer games you could ask your friend to play right beside you and you guys could play together. You can kick the football in air and your friend trying to save the goal. You could scan your skateboard into the game and play virtually.
In games like Dance Central Spotlight, you can dance, and the character in the game will copy your moves.
Kinect was also being used in schools to interactively teach young minds.
Kinect was powered by hardware and software.
The multi-cam setup was doing two things
- Generating 3D moving image of the objects in its field of view
- Recognize (moving) human beings among those objects.
The camera uses a technique called time-of-flight to distinguish objects from the background. It emits near-infrared light that is invisible to the human eye and measures how long it takes to bounce back after hitting the objects. This is similar to how sonar works: The longer the light takes to return, the farther away the object is.
Kinect Software
The Kinect had a separate processor onboard, which had algorithms to process the data passed in from the camera and render the 3d image.
It could also recognize people and identify human body parts and their joint with movement. The software would place a estimated skeleton on the detected human to let the software know the position of the human for the virtual character to do the same.
It also identifies a person from another, so when you face it, it knows who you are.
Top Computer Vision Algorithms
From Pixels to Insights
enosjeba.medium.com
Things to admire in Kinect
The software and hardware integration are amazing.
The movement replicated in almost real-time by another 3d model on the screen is a complex thing to achieve.
The cameras identifying each body part were also significant for the virtual models to make the same moves.
You can watch Dr Andrew Fitzgibbon, who was a key person in developing the technology for Kinect, gave a seminar on building Computer Vision Systems at the University of Surrey, Guilford
Face tracking cameras
Keeping your face in center of the video frame makes a huge difference when you are video calling. The person on the other side will be more comfortable when they can see your natural pose than your half face or the part of your home where you are right at the edge of the frame.
The main objective was to keep the face in the center of the video frame. Owing to webcamβs which almost will be stationary, you always had to be sitting right in front of it.
We could detect your face from the webcam but if you move, we cannot move the webcam or change its direction and thatβs where Face Tracking camera paved their way in.
The cameras were attached to a motorized base, which could rotate on itself. The camera was equipped with a face tracking model to know the location of the person in the frame would estimate the position and send signals to the motorized base on the required movement to recenter the face on screen.
Some cameras also used zoom features to zoom out if the person is too close and zoom in if the person is in a distance.
Later we also got to see cameraβs mounted on small gimbal system controlled by live feed from the cameras. This mean that you could just move around than sitting just on a place for video conferencing.
Content creators soon were also presented with cameras which would keep their faces in the center so that their Vlogs would come out better.
Looking at the camera and taking now seemed more easier as you have an automated companion to track your face and present you well.
Center Stage
Surprisingly, Apple also took a shot at it with their existing hardware's. Apple with the help of its ultrawide angle camera and heavy software procession manages to keep people in the center of the frame and they call it Center Stage.
It cannot just rotate and do such stuff like the gimbals or motorized cameras as here just one image is digitally skewed to bring the desired natural effect.
How to See Like a Machine
A Guide to Computer Vision Tools
enosjeba.medium.com
Things we can grab from this projects.
- Not just combining electrical components, we can also make the components bring the best of our algorithm by sending them specific instructions to perform. I mean Why Not?
Windows Hello
Windows Hello was made to allow laptop users to gently sign in without even touching the device. People's natural interaction with a laptop would open the lid and face the screen. Microsoft thought to unlock the screen at the first glance at the screen itself, so it came up with Windows Hello.
When Windows Hello was made into the market, the demand for notebook laptops increased, and people were getting comfortable holding lightweight devices that could also easily slide into their bags.
Windows hello worked based on Infrared cameras. It also reads facial movements to ensure liveness of a person. It also extends to recognize body heats to make sure you are a living person, so photographs are not going to work.
It was also claimed that during the testing it also differentiated between Identical twins.
It also is claimed to measuring the distance between your eyes and forehead etc. Itβs not limited to a single picture.
Computer Vision
Things I wish I knew before I started.
enosjeba.medium.com
Things we can pick up from Windows Hello
- User Experience improves the value of the algorithm.
- It does not always have to be just one image; you could also use a stack of images to improve the accuracy score.
Autonomous Vehicle
Tesla. Yes, a popular electric vehicle with the capability to drive around on its own.
Equipped with multiple cameras, Electric vehicles now posses to ability to drive with decision based on multiple Realtime video feed bundled with object detection, lane judgement, pedestrian detection etc.
With multiple cameras, the algorithm becomes aware of its surroundings. It now knows.
- How far the car behind is standing?
- Is there anyone walking beside me?
- Whatβs in front of the car?
- Is anyone walking in the front?
- Whereβs the white line on the road?
with so much more essential information to make a decision in the middle of the road.
And with additional sensors such as GPS we have our location on the map. The GPS also helps to prevent theft.
Autonomous Levels
I got this piece from the internet
- Driver Assistance β Operated by a human driver, but with assistance from an automated system for steering or acceleration (but not both)
- Partial Automation β automated assistance is offered for both steering and acceleration, but the driver is still responsible for safety-critical actions.
- Conditional Automation β sensors monitor the surrounding environment, and the additional activities are automated such as braking. The driver must be ready to intervene should the system fail.
- High Automation β the vehicle can operate fully autonomously but the mode can only be activated under specific conditions.
- Full Automation β the driver specifies the destination, and the car does the rest on its own. At this level, there is no need for a steering wheel or pedals.
And with computer vision, we have pretty much reached Full Automation.
Assuming you have been inspired to level up your computer vision projects with the current industrial methods, letβs re-revolutionize again.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI