Thursday, July 25, 2024
- Advertisement -

    Latest Posts

    Here’s what OpenAI said about its latest model GPT-4o

    OpenAI has announced the launch of its latest multimodal AI model GPT-4o, which will be made available to its users for free. The model is unique due to its ability to accept input of any combination of text, audio, and image and generate any combination of text, audio, and image outputs. OpenAI claims that it has GPT-4 level intelligence but “much faster and improves on its capabilities across text, voice, and vision.” Further, OpenAI also claims that its audio response time is similar to human response time.

    GPT-4o will also be available for developers in the API and is reportedly twice as fast and half the price, compared to GPT-4 Turbo. While the capabilities of GPT-4o are available for free, it differs for paid users by having five times the capacity limits.

    Text and image capabilities are starting to roll out today in ChatGPT-4o while the rest of the abilities will be rolled out iteratively. OpenAI plans to launch GPT-4o’s new audio and video capabilities to a “small group of trusted partners in the API” in the coming weeks.

    What can GPT-4o do?

    [We will update this as more capabilities are revealed]

    Text capabilities

    Improvements acrosss languages

    According to OpenAI, 4o “matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages.”  ChatGPT supports more than 50 languages. There have reportedly been notable improvements in efficiency for Indian languages, Gujarati , Telugu, Tamil, Marathi and Urdu.

    The model is capable of generating multiple images depicting a visual narrative based on text inputs and generating caricatures. Further, it can convert textual input into the desired typography.

    Audio capabilities

    GPT-4o reportedly has notable improvements in audio outputs. Previous iterations of did have Voice Mode, but it was significantly slower since it used 3 separate models to provide an output. It was also unable to observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion. “This also brings a lot of latency to the experience, and it really breaks that immersion in the collaboration with ChatGPT. But now, with GPT-40, this all happens natively.”, said Mira Murati, Chief technology officer of OpenAI during a live demonstration.

    OpenAI in its livestream explained that GPT-4o had the ability to be interrupted, respond in real time and pick up emotions and demonstrated how 4o ‘s audio output was “able to generate voice in a variety of different emotive styles.” OpenAI has shared a video of 4o being capable of having real-time conversations, varying its voice based on commands and providing real-time translation. OpenAI also demonstrated the ChatGPT Voice app that functions as an assistant on the Desktop App , assisting with coding. In its blogpost, it also shared summarization of lectures and meetings as examples of use cases.

    Visual capabilities

    The model reportedly also has improved visual capabilities, allowing users to interact over video. During the live demonstration OpenAI displayed the model’s capabilities to guide users while solving equations. It also claimed that 4o has the ability to identify objects and and provide information or interact with them, as demonstrated in this video of GPT-40 identifying objects and providing  real-time Spanish translation. OpenAI also demonstrated that 4o on the desktop app was capable of analyzing data on the Desktop App.

    How safe is GPT-4o?

    Murati said, “GPT-40 presents new challenges for us when it comes to safety, because we’re dealing with real-time audio, real-time vision.” OpenAI claimed that according to its evaluation based on its  Preparedness Framework  GPT-4o does not score above Medium risk for cybersecurity, Chemical, Biological, Radiological and Nuclear (CBRN) information , persuasion, and model autonomy. They acknowledged that GPT-4o’s audio capabilities present unique risks. Thus, audio outputs will be limited to a selection of preset voices at launch.

    OpenAI has introduced multiple features in the last month, including a ‘memory’ feature for ChatGPT plus users, which allows the AI model to remember information users provide across conversations. The feature could turned on or off in the personalisation settings and recorded memories could be ‘forgotten’ by deleting them from the same personalization settings tab.

    In February, the company announced that they would be watermarking all the synthetic images they generated by including Coalition for Content Provenance and Authenticity (C2PA) metadata for all images generated ChatGPT on the web and other OpenAI API services using DALL·E 3. This would allow users to check if the image was generated using OpenAI tools through websites like Content Credentials Verify.

    Prior this, in January, it also launched GPT Store, where users could share custom versions of ChatGPT tailored for specific use cases.

    Also Read:

    STAY ON TOP OF TECH NEWS: Our daily newsletter with the top story of the day from MediaNama, delivered to your inbox before 9 AM. Click here to sign up today!


    The post Here’s what OpenAI said about its latest model GPT-4o appeared first on MediaNama.

    Latest Posts

    - Advertisement -

    Don't Miss

    Stay in touch

    To be updated with all the latest news, offers and special announcements.