My Experience with the Apple Vision Pro and Future Perspectives in Computer Vision and Healthcare
Last Updated on February 5, 2024 by Editorial Team
Author(s): Alberto Paderno
Originally published on Towards AI.
βIt was a rainy day in Palo Alto. The line in front of the Apple Store was interminable, and only a few strong-willed people could be persistent enough to wait for hours and have a shot at earning one of the first commercially available Apple Vision Pro. Each step brought me closer to a revolution in computingβ¦β
Ok, letβs be real. This sounds great as an opening, but I would never wait in line for any commercial product. However, thanks to the amazing Digital Health team of the Stanford Byers Center for Biodesign, I was able to try the new Apple Vision Pro and have some discussion about its potential in computer vision and healthcare.
Biodesign for Digital Health
The aging population, growth in chronic disease, skyrocketing healthcare costs, and the increasing shortage ofβ¦
biodesign.stanford.edu
Why Computer Vision and Healthcare
I wonβt talk about the manufacturing quality. Apple usually does an excellent job and their new top-of-the-line product does not underdeliver. There are already a lot of professional reviews talking about this, not really my field. Letβs look at things from a computer vision and AI perspective β what is the potential, what should we expect, and what are the future lines of research?
And, of course, being a surgeon, itβs difficult not to think about the subsequent developments of this type of technology in healthcare. My head bumped into enough screens in the operating room to realize that they might not be the best solution for our current applications. Screens are everywhere in healthcare: endoscopic, laparoscopic, exoscopic, robotic surgery, patient monitoring (ECG, saturation, ventilation), radiology, etc. And if you think the Apple Vision Pro is expensive, you should look at the prices of medical-grade monitors.
But screens were a necessary evil that allowed us to shift from purely optical visualization technologies (e.g., optical microscopes and loupes) to a direct digital input β the dream of every computer vision researcher. With this type of input, itβs possible to collect training data (data that we previously threw away!), analyze procedures, and develop AI applications that are valuable in clinical practice.
From Eyes to Algorithms
While my initial focus was testing visual quality, I was struck by how intuitive the experience was, and I found myself being more impressed by some less-discussed features:
– Eye-tracking
– Passthrough quality
– Hand/pinch detection and tracking
These are the elements that make the experience particularly seamless and that, together with the concept of βSpatial Computing,β will revolutionize current UI/UX design standards. But they also made me more aware of the possible interactions with computer vision and AI.
Let me explain.
Eye-tracking is not just the βnew mouseβ; itβs a data source
Attention is a central concept in AI and computer vision. The article βAttention is All You Needβ introduced this concept by describing the transformer architecture in natural language processing, and βAn Image is Worth 16×16 Words: Transformers for Image Recognition at Scaleβ transferred this concept to the visual field. We are talking about a mathematical form of attention, but the broad meaning is still there.
Eye movements are the external manifestation of human visual attention. For the first time, we (well, Apple) will have an instrument that tracks and records eye movements prospectively, long term. This is an exciting concept since eye micromovements are mostly involuntary and will help us understand how we perceive the visual world β with potential correlates on the way we structure and evaluate computer vision algorithms. On the other hand, itβs scary to consider that, by virtue of their subconscious nature, eye movements might help profile customers or track what the person is thinking β a sort of βyou are what you look atβ concept.
Finally, medical conditions (especially neurologic and balance disorders) can influence eye micromovements and eye-hand coordination. Constant tracking may be a beneficial βopportunistic screeningβ tool to diagnose these conditions early.
High passthrough quality brings computer vision to vision
One of the most striking elements is the impact of high passthrough quality on the overall experience. The objective is to shut off the perception of looking at a screen and to integrate digital elements into the real world. And the Apple Vision Pro gets really close to that.
This has been achieved thanks to the concomitant increase in resolution and quality of camera sensors and micro-OLED displays β and we are getting closer to a condition where it will be impossible to determine if we are looking through a digital camera + screen, or through a glass.
As a consequence, it will be possible to apply computer vision to every setting in everyday life. Itβs not just autonomous driving and specific applications. Computer vision applications wonβt need a separate device β smartphones, tablets, computers, endoscopes β the interaction will be direct.
Spatial computing is the perfect platform for computer vision.
Interfaces based on hand-eye input will change UI/UX design principles
Conventional interfaces are based on well-defined input devices (e.g., mouse, keyboard, trackpad). Here, everything in the visual field can potentially become an input source β starting from the hands and extending to the entire available space. Again, this is based on computer vision (eye and hand/gesture tracking on top of everything) β the entire video feed from the numerous cameras must be processed as an βinputβ, reprocessed, and integrated with digital components (creating the βoutputβ), shattering the conventional separation between input and output. This will significantly increase the interactions between the applications, user, and environment β ultimately requiring new UI/UX design paradigms.
Yes, And What About Healthcare?
As a physician and surgeon, itβs difficult not to think about the potential revolution this technology would bring into healthcare, apart from the previously mentioned struggle between my head and floating monitors. The surgeon could easily position and look at 2D or 3D screens during endoscopic or exoscopic surgery, integrating the view with the patientβs information from vital signs tracking, radiologic imaging, and image enhancement techniques. The view could be further extended with dedicated computer vision algorithms to detect instruments, tissues, and anatomical structures.
Finally, UI/UX design in healthcare is far from ideal. Ease of use and functional layouts are often low priorities when dealing with complex medical data. However, the advent of spatial computing offers a blank slate to build on, maybe following better concepts of design and usability.
The Spezi framework from Stanford caters to these needs thanks to its modular structure. Specifically, Spezi is an open-source framework for the rapid development of modern, interoperable digital health applications, and the team is already working on integrating applications in VisionOS.
Spezi
Spezi is an open-source framework for the rapid development of modern, interoperable digital health applications basedβ¦
spezi.sites.stanford.edu
In wrapping up my dive into the Apple Vision Pro and its intersection with computer vision and healthcare, itβs clear weβre on the cusp of a transformative period. This device isnβt just about sharper images or smoother interfaces; itβs about redefining our interaction with technology and its application in medicine. The Vision Pro exemplifies how technology can seamlessly integrate into our lives, offering insights that extend far beyond the screen.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI