OpenAI unveils Sora to make videos from text: Check features, other details

By Ann Roberts On Feb 17, 2024

ChatGPT maker OpenAI on Thursday unveiled ‘Sora’, its new artificial intelligence model that can convert text prompts into realistic videos.

Sora can produce videos after taking instructions from a user on the style and subject of the clip. Besides generating videos from text prompts, it can animate a still image, OpenAI said in a blogpost.

Elevate Your Tech Prowess with High-Value Skill Courses

Offering College	Course	Website
Indian School of Business	ISB Product Management	Visit
IIT Delhi	IITD Certificate Programme in Data Science & Machine Learning	Visit
IIM Lucknow	IIML Executive Programme in FinTech, Banking & Applied Risk Management	Visit

“Today we are starting red-teaming and offering access to a limited number of creators,” chief executive Sam Altman posted on X.

The announcement of the text-to-video model comes after OpenAI launched its popular chatbot ChatGPT in late 2022.

“Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” the company said.

What can Sora do?

Discover the stories of your interest

OpenAI said Sora can generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.“We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction,” OpenAI said.

The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions, it added.

Sora can also take an existing still image and generate a video from it. The model can also take an existing video and extend it or fill in missing frames.

Who can access Sora so far?

Sora is still a work in progress and OpenAI has currently granted access to researchers, visual artists, designers and filmmakers to assess critical areas for harms or risks and “to gain feedback on how to advance the model to be most helpful for creative professionals.”

“We’re sharing our research progress early to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities are on the horizon,” it said.

Any weakness in the new AI model?

OpenAI has said the current Sora model has weaknesses. It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect.

The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.

Steps OpenAI is taking for safety

Sora is not available to the public yet as OpenAI is taking safety steps ahead of making it available in OpenAI’s products. “We are working with red teamers — domain experts in areas like misinformation, hateful content, and bias — who will be adversarially testing the model,” it said.

“We’re also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora. We’ve also developed robust image classifiers that are used to review the frames of every video generated to help ensure that it adheres to our usage policies, before it’s shown to the user,” OpenAI said.

The company will also be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for the new technology, it added.

Other text-to-video models

Sora is not the first video-generating model. Meta last year added new AI-based features in its image generation model Emu which can edit and generate videos from text prompts. Meanwhile, Google earlier this year introduced Lumiere, its new AI-powered tool that uses generative AI to generate videos from simple text prompts.