The potential of augmented reality glasses has been a topic of interest on this site, yet these devices have not become commonplace for daily use. They often appear bulky or provide limited functionality. However, the integration of AI could redefine their purpose beyond mere smartphone screen emulation. Envision them focusing on capturing images and sounds, then offering information through audio output. Such a design would require glasses that look nearly identical to traditional models but are outfitted with a camera and microphone, capable of transmitting sound via the frames, all driven by an artificial intelligence engine. Meta and other manufacturers are pursuing this innovative approach, leading to a new generation of smart glasses.Â
Meta makes the leap to AI-enabled glasses
Upon releasing its Ray-Ban Meta Smart glasses in late 2023, Meta encountered a tepid market response. The concept of glasses equipped with a camera for capturing photos and videos was perceived more as a novelty for influencers rather than a must-have gadget, not to mention the privacy concerns it raised. Yet, a pivotal shift occurred in December of that year. Meta, the conglomerate behind Facebook, revealed plans to incorporate multi-modal artificial intelligence capabilities, mirroring Google’s endeavors with its Gemini AI. This advancement meant that the glasses would transcend basic command responses and media capture; they were set to begin analyzing images and unlocking previously unimaginable functionalities.
What is multimodal artificial intelligence?
Multimodal artificial intelligence is marking a revolutionary advancement in the realm of AI systems by integrating and processing a variety of data types, including text, image, sound, and video. This approach enables AI to comprehend and engage with the world in a more intricate and ambitious manner. In contrast to unimodal systems focusing on a single data type, multimodal AI can simultaneously interpret complex information from multiple sources. This multifaceted understanding allows for the performance of tasks with an unprecedented level of accuracy and insight.
For smart glasses, this evolution means that inputting text or voice commands is no longer necessary. Instead, these devices can analyze the scene in front of the user and provide information based on the visual data they gather. The scope of applications for this technology is limitless, opening new doors to how we interact with and understand our surroundings.
What does the new generation of glasses allow you to do?
The glasses from Meta and other companies such as Brilliant Labs or Envision typically require connection to a smartphone, which handles the heavy computing. Currently, the models on the market are limited to analyzing photographs. Once the multimodal AI processes the image, it enables capabilities such as the following:
- Provide recipe suggestions based on the ingredients available in the refrigerator.
- Detail the nutritional values of a food item.
- Indicate the store where a clothing item or object can be purchased.
- Diagnose a household malfunction and suggest possible solutions.
- Identify plants or animals.
- Read and translate texts.
- Translate for speakers of other languages.
Several applications, such as recipe generation, are already available, while others—including some yet to be developed—will gradually become a reality. For instance, the Brilliant Labs glasses model features a micro-OLED display that enables augmented reality applications, such as visualizing a sofa in a different color. That said, there is an area where these types of devices could potentially be life-changing.
A leap in accessibility
Individuals with vision impairments or blindness have rapidly recognized the transformative potential of this technology. They can now inquire about anything within their field of vision—be it an object, a person, or text—and the glasses will provide a detailed explanation. Beyond AI glasses, innovative wearables that eschew traditional lenses in favor of a camera-equipped headset design are being developed.
One notable innovation comes from the University of Singapore, where a headset featuring a 13-megapixel camera has been created. This device captures images upon the user’s command, and the integrated AI then analyzes the photographed object’s size, shape, and color. Distinctively, this model operates independently without needing to connect to a smartphone or any other external device.
Although this particular headset, which delivers sound directly through the bones of the skull, is not yet available, other models from Meta and various manufacturers are already in the market. These devices promise to significantly enhance the quality of life for individuals with disabilities, offering them unprecedented levels of independence and interaction with their environment.
Â
Source: Â
Images: