• article

GPT-4o: Features and Prospects of OpenAI’s Omni Model

OpenAI’s spring update has introduced a new flagship model—GPT-4o and rekindled the discussion about the future of AI.

A woman in the office presenting chat gpt 4o
A young woman

Dorota Jasińska

Content Specialist
Marcin Dobosz

Marcin Dobosz

Director of Technology

The recent Spring Update by OpenAI has brought a new model announcement along with more tools for ChatGPT free users. GPT-4o has been introduced as the newest flagship model of the company with the advancement of AI technology in focus.

The introduction of GPT-4o has been widely commented on on the web and has rekindled the conversation about the future of AI. This is mainly due to the fact that OpenAI has entered a new ground from the tech perspective—the modality and cross-functionality of the technology opens up new possibilities.

Spring update

The most important intake from the event was the superior capabilities of GPT-4o compared to old models and the possibility to process and generate content across text, voice, and vision. The livestream showcased some use cases of the model.

GPT-4o

The new flagship model, GPT-4o, with “o” standing for “Omni,” was presented by OpenAI’s CTO, Mira Murati. Murati started with some general information about the recent changes, such as a refreshed UI and the fact that all users will be able to access GPT-4 capabilities for free.

The revolution GPT-4o brings to the table is mostly due to its cross-functionality as it works on any type of input—text, audio and image, including video. It also can generate responses as a combination of the aforementioned. The response time to the audio input is similar to a human conversation, which was presented in a demo.

The demo also showed the GPT-4o’s capabilities of reading and solving a math task from a video along with a tutoring support for the tester. Real time translation was another presented functionality. The presentation also showed the model’s ability to read and interpret code or graphs and comment on a person’s emotions based on a photo.

Moreover, according to OpenAI, the new model matches the performance of GPT-4 Turbo in terms of text and code and includes an improvement in text in non-English languages. It’s also much faster and cheaper.

The enrollment of the new features is iterative so we need to wait to test them all. What’s worth mentioning is the fact that ChaGPT will be available through a desktop app. GPT-4o is also available to developers in the API.

Voice mode

During the demo, the testers used the voice mode to check the capability of the new model. They talked with the model and mentioned an upgrade to the speaking mode—the possibility to interrupt the bot. In the past it was necessary to wait for the bot to finish talking. Now, it stops once it hears the voice of the user.

This is a significant upgrade but there’s more. The generated voice can modulate the tone and change emotions of the speech. The demo presented a story that was delivered in different manners and emotional intensity.

All commands to the bot were verbal and the conversation was mutual and human-like. The bot also made some jokes as a natural reaction to the topic. The voice mode feature will be enrolled with time as OpenAI realizes the safety challenges.

Safety and limitations

The presented capabilities raised many questions about the model’s safety and limitations. OpenAI was informed about the actions taken to ensure safety. They also used the help of external experts to identify the risks of the new modalities.

The voice and image inputs also raise some concerns. It’s necessary to put some restrictions on the content. The team evaluated the model before its release in terms of possible risks. The safety interventions that were prepared to ensure the model’s safety will be improved with time.

This is why the capabilities of the model at launch will be limited and released iteratively. For example, audio outputs will have a set of voices and will work according to current safety policies.

Model availability

OpenAI announced that the availability of GPT -4 will be broader and free, while paid users will have extra capacity limits. Free users who use up their limit will be automatically switched to GPT-3.4 to continue their work.

The capabilities of GPT-4o will be rolled out in time. The Voice Model will have GPT-4o included soon. Developers can now access GPT-40 through API, but without the voice model. The new audio and video capabilities of GPT-4o will be made available to selected partners soon.

Community reaction

The demo was widely commented on the web. Some people were awed by the possibilities the model opened up, including the increased accessibility for people with disabilities. The option to tutor and explain all steps necessary to solve a mathematical problem also has potential for students.

Still, some recipients mentioned the privacy and security implications and the possible misuse of the model. One of the examples was identity theft. The sci-fi aspect of the model was also addressed by referring to “Black Mirror” series or “Her” movie.

Conclusion

It may seem OpenAI pushed another technical boundary that stirred discussion about the capabilities, possible uses and dangers of AI. While the access to ChatGPT will be wider, the enrollment of the new model is in progress.

The caution of the release of new voice and image functions is understandable. Still, we can witness another significant leap in the development of deep learning that may raise some concerns about user’s safety and possible misuses. At this moment, it’s too early to judge the impact of OpenAI’s newest flagship model.


Further Exploration

Top AI Solutions Compared: Google Gemini vs. ChatGPT
Top Gen AI Platforms For Software Development 

Share:
copy link
Agata Tomasik
Agata Tomasik
Board Member
Head of Outsourcing
agata.tomasik@neontri.com

Contact me

    Type of inquiry: