OpenAI Expands ChatGPT with Vision Capabilities for Real World Engagement

The new “Live Camera” feature, which is expected to arrive soon in a beta version, will allow ChatGPT to “see” and engage with your surroundings in real-time.

When OpenAI first introduced ChatGPT, the AI was anyways magnificent with its ability to hold conversations, answer questions, and even help with a variety of tasks. The Advanced Voice Mode, added a bit of friendliness to ChatGPT, allowing users to have natural conversations with the AI assistant. However, there was a promised vision capability that OpenAI showed off during the GPT-4o announcement in May 2024.

During the launch of GPT-4o in May 2024, the company promised vision capabilities. This is supposed to take ChatGPT beyond just text and voice, allowing it to understand and interact with the real world through live video. It appears that OpenAI has worked towards that feature, and it is ready to come out of the alpha phase.

ChatGPT vision feature rollout

According to multiple strings of codes, the ChatGPT live vision feature capabilities in the Advanced Voice Mode may be gearing up for a wider beta rollout. The strings with live vision feature were spotted in the latest ChatGPT v1.2024.317 beta build.

The new “Live Camera” feature, which is expected to arrive soon in a beta version, will allow ChatGPT to “see” and engage with your surroundings in real-time. Users will be able to activate the feature by tapping a camera icon within the app, which will trigger the AI to view and comment on whatever it sees through the device’s camera.

ChatGPT live video feature

As announced during the GPT4o rollout, the feature is built on the Advanced Voice Mode, which lets ChatGPT have natural, flowing conversations. The addition of vision capabilities means that ChatGPT can now recognize objects and people, remember names, and even make associations between items in the environment.

ChatGPT’s Advanced Voice Mode with vision capabilities

During the GPT-4o announcement, OpenAI showcased the Advanced Voice Mode with vision capabilities. In the demo, the ChatGPT seamlessly recognized the subject in the camera, remembered its name, recognized the ball, and associated it with the dog. The demo was quite impressive, considering the AI assistant had little information the user had to input to elicit the responses.

ChatGPT vision capabilities

For example, in a demo during the GPT-4o event, ChatGPT was shown identifying a dog, recalling its name, recognizing a ball, and understanding the game of fetch, all without the user needing to provide specific input beyond the initial setup.

Now, thanks to new code strings spotted in the latest beta update by Android Authority, it looks like OpenAI is preparing for a wider rollout of the ChatGPT with vision capabilities. OpenAI made sure to add a cautionary warning, stating users are advised not to rely on the “Live Camera” for important decisions, like navigation or anything that could impact health or safety.

ChatGPT Live Camera availability

While it’s still in the beta stage, we expect the “Live Camera” feature to soon be available to ChatGPT Plus subscribers and possibly other paid tiers. If this ends up happening, it would give it a step up over competitors like Google’s Gemini, whose closest offering to this is Google Lens which doesn’t offer live capabilities like ChatGPT is claiming.

A feather in ChatGPT’s hat

ChatGPT has about 10 million paying users which is nothing compared to Google’s 100 million Google One subscribers that have access to Gemini Advanced. With more additions like this, ChatGPT could give many users valid reasons to jump ship from whatever chatbot they currently serve to this.

Source link