From Script to Flick: OpenAI's Sora is the next act

OpenAI recently announced Sora, a text-to-video model currently ‘red teamed’ (undergoing safety testing) that has taken the web by storm, and rightfully so. Text-to-video models have been floating around for a couple of years now, but nothing with results anywhere near as impressive as Sora.

Without getting bogged down in specifics, Sora combines the learnings from text-to-image generators (like DALL-E 3) and their flagship product ChatGPT to form a model that can keep track of multiple people, animals and objects between frames within video content, creating a more cohesive and realistic sequence.

That might not sound like much, but it’s an enormous leap forward for the AI space, and one that’s come much sooner than anyone expected. This also means we’re having to grapple with a few important questions – some exciting, some uncomfortable, some a bit of both.

Is this the end of the world as we know it?

Probably not.

What does this mean for content?

It means an explosion. Not yet, but definitely soon. The ability to create realistic (or entirely unrealistic) video according to your needs with a few keystrokes is game-changing for content and production teams. It’s very unlikely we’d see a complete shift towards AI tools for video ‘capture’ (at least for the time being), but for challenging shots, last-minute changes or just footage that’s ‘out of scope’, there’s a definite use case.

More than anything, we’re likely to see AI video start to perform a similar function to AI image generation – slowly replacing stock footage libraries.

Will Sora ‘kill’ creative expression?

One of the biggest concerns about AI video (and generative AI as a whole) is the impact it will have on creative fields – the possible AI-driven ‘death of the artist’. While this concern is warranted and one I share to some extent, AI isn’t going to replace or significantly displace creative expression.

People will always create, it’s in our nature to do so. AI is just another tool of creative expression, and in our experience, it’s creatives who are the most eager to play around with it.

Are production teams getting replaced?

Not likely. As with just about every AI tool, Sora is likely to be best utilised by the very people at most risk of being replaced by it. Individuals and teams with a solid understanding of how video conveys ideas and emotions, how shots are constructed and scenes play out will have a much easier time directing a model like Sora, and are likely to get a lot out of it, especially as they uncover new, more creative ways of utilising it.

Is Sora everything we’re talking it up to be?

Undoubtedly there are limitations and roadblocks that OpenAI won’t be showing front and centre. I shudder to think of the sheer scale of processing power needed to render these videos, and the time it takes to do so. We can likely expect a hefty price tag as a consequence.

What we’re seeing at the moment is also likely to be the best outcomes it’s capable of producing, not necessarily what it will render on an average day, so it’s best to temper expectations around quality.

When will Sora be available?

At the moment, Sora is undergoing safety and quality testing, so isn’t generally available. They also haven’t set a launch date yet, so expect at least a good couple of months.

Introducing Sora — OpenAI’s text-to-video model