Software developer at a big library, cyclist, photographer, hiker, reader. Email: chris@improbable.org
24189 stories
·
216 followers

Actually Using SORA - fxguide

1 Share

In February, we pushed our first story on SORA; OpenAI had just released the first clips from SORA, which we described at the time as the video equivalent of DALL·E for videos. SORA is a diffusion model that generates videos significantly longer and with more cohesion than any of its rivals. By giving the model foresight of many frames at a time, they have solved the challenging problem of ensuring a subject stays consistent even when it goes out of view temporarily. SORA can generate entire videos, all at once up to a minute in length. At the time, OpenAI also published technical notes indicating that it could (in the future) extend generated videos to make them longer or blend two videos seamlessly.

Several select production teams have been given limited access to SORA in the last few weeks. One of the most high-profile was the Shy Kids team, who produced the SORA short film Air Head. Sidney Leeder produced the film.  Walter Woodman was the writer and director, while Patrick Cederberg was responsible for the post-production. The Toronto team have been nicknamed “punk-rock Pixar”, while their work has garnered Emmy nominations and been long-listed for the Oscars. We sat down this week with Patrick for a long chat about the current state of SORA.

Shy Kids is a Canadian production company renowned for its eclectic and innovative approach to media production. Originating as a collective of creatives from various disciplines, including film, music, and television, Shy Kids has gained recognition for its unique narrative styles and engaging content. The company often explores adolescence, social anxiety, and the complexities of modern life while maintaining a distinctively whimsical and heartfelt tone. Their work showcases a keen eye for visual storytelling and often features a strong integration of original music, making their productions resonant and memorable. Shy Kids has successfully carved out a niche by embracing new AI technology and creativity, pushing what is possible.

SORA : Mid-April ’24.

SORA is in development and is actively being improved through feedback from teams such as Shy Kids, but here is how it currently works. It is important to appreciate that SORA is effective almost pre-alpha. It has not been released nor is it in beta.

“Getting to play with it was very interesting,” Patrick comments. “It’s a very, very powerful tool that we’re already dreaming up all the ways it can slot into our existing process. But I think with any generative AI tool; control is still the thing that is the most desirable and also the most elusive at this point.”

UI

The user interface allows an artist to input a text prompt; OpenAI’s ChatGPT then converts this into a longer string, which triggers the clip generation. At the moment, there is no other input; it is yet to be multimodal. This is significant as while SORA is correctly applauded for its object consistency during a shot, but there is nothing to help make anything from the first shot match in a second shot. The results would be different even if you ran the same prompt a second time. “The closest we could get was just being hyper-descriptive in our prompts,” Patrick explains. “Explaining wardrobe for characters, as well as the type of balloon, was our way around consistency because shot to shot / generation to generation, there isn’t the feature set in place yet for full control over consistency.”

The individual clips are remarkable and jaw-dropping for the technology they represent, but the use of the clips depends on your understanding of implicit or explicit shot generation. Suppose you ask SORA for a long tracking shot in a kitchen with a banana on a table. In that case, it will rely on its implicit understanding of ‘banana-ness’ to generate a video showing a banana. Through training data, it has ‘learnt’ the implicit aspects of banana-ness: such as ‘yellow’, ‘bent’, ‘has dark ends’, etc. It has no actual recorded images of bananas. It has no ‘banana stock library’ database; it has a much smaller compressed hidden or ‘latent space’ of what a banana is.  Every time it runs, it shows another interpretation of that latent space. Your prompt replies on an implicit understanding of banana-ness.

Prompting the right thing to make Sonny

For Air Head, the scenes were made by generating multiple clips to an approximate script, but there was no explicit way to have the actual yellow balloon head the same from shot to shot. Sometimes, when the team prompted for a yellow balloon, it wouldn’t even be yellow. Other times, it had a face embedded in it or a face seemingly drawn on the front of the balloon. As many balloons have string, often the Air Head character, nicknamed Sonny, the balloon guy, would have a string down the front of the character’s shirt. Since it implicitly links string with balloons and thus these would need to be removed in post.

Resolution

Air Head is only using SORA-generated footage, but much of it was graded, treated, and stabilised, and all of it was upscaled or upresed. The clips the team worked with were generated at a lower resolution and then upresed using AI tools outside SORA or OpenAI. “You can do up to 720 P (resolution),” Patrick explains. “I believe there’s a 1080 feature that’s out, but it takes a while (to render). We did all of Air Head at 480 for speed and then upright using Topaz.”

Prompting ‘time’: A slot machine.

The original prompt is automatically expanded but also displayed along a timeline. “You can go into those larger keyframes and start adjusting information based on changes you want generated.” Parick explains, “There’s a little bit of temporal control about where these different actions happen in the actual generation, but it’s not precise… it’s kind of a shot in the dark – like a slot machine – as to whether or not it actually accomplishes those things at this point.” Of course, Shy Kids were working with the earliest of prototypes, and SORA is still constantly being worked on.

In addition to choosing a resolution, SORA allows the user to pick the aspect ratio, such as portrait or landscape (or square). This came in handy on the shot that pans up from Sonny’s jeans to his balloon head. Unfortunately, SORA would not render such a move natively, always wanting the main focus of the shot—the balloon head—to be in the shot. So the team rendered the shot in portrait mode and then manually, via cropping, created the pan-up in post.

Prompting camera directions

For many genAI tools, a valuable source of information is the metadata that comes with the training data, such as camera metadata. For example, if you train on still photos, the camera metadata will provide the lens size, the f-stop and many other critical pieces of information for the model to train on. With cinematic shots, the ideas of ‘tracking’, ‘panning’, ’tilting’ or ‘pushing in’ are all not terms or concepts captured by metadata. As much as object permanency is critical for shot production, so is being able to describe a shot, which Patrick noted was not initially in SORA. “Nine different people will have nine different ideas of how to describe a shot on a film set. And the (OpenAI) researchers, before they approached artists to play with the tool, hadn’t really been thinking like filmmakers.” Shy Kids knew that their access was very early, but “the initial version about camera angles was kind of random.” Whether or not SORA was actually going to register the prompt request or understand it was unknown as the researchers had just been focused on image generation. Shy Kids were almost shocked by how much the OpenAI was surprised by this request. “But I guess when you’re in the silo of just being researchers, and not thinking about how storytellers are going to use it… SORA is improving, but I would still say the control is not quite there. You can put in a  ‘Camera Pan’ and I think you’d get it six out of 10 times.”  This is not a unique problem nearly all the major video genAI companies are facing the same issue. Runway AI is perhaps the most advanced in providing a UI for describing the camera’s motion, but Runway’s quality and length of rendered clips are inferior to SORA.

Render times

Clips can be rendered in varying segments of time, such as 3 secs, 5 sec, 10 sec, 20sec, up to a minute. Render times vary depending on the time of day and the demand for cloud usage. “Generally, you’re looking at about 10 to 20 minutes per render,” Patrick recalls. “From my experience, the duration that I choose to render has a small effect on the render time. If it’s 3 to 20 seconds, the render time tends not to vary too much from between a 10 to 20-minute range. We would generally do that because if you get the full 20 seconds, you hope you have more opportunities to slice/edit stuff out and increase your chances of getting something that looks good.”

Roto

While all the imagery was generated in SORA, the balloon still required a lot of post-work. In addition to isolating the balloon so it could be re-coloured, it would sometimes have a face on Sonny, as if his face was drawn on with a marker, and this would be removed in AfterEffects. similar other artifacts were often removed.

Editing a 300:1 shooting ratio

The Shy Kids methodology was to approach post-production and editing like a documentary, where there is a lot of footage, and you weave a story from that material rather than strictly shooting to a script. There was a script for the short film, but the team needed to be agile and adapt. “It was just getting a whole bunch of shots and trying to cut it up in an interesting way to the VO,” Patrick recalls.

For the minute and a half of footage that ended up in the film, Patrick estimated that they generated “hundreds of generations at 10 to 20 seconds a piece”. Adding, “My math is bad, but I would guess probably 300:1 in terms of the amount of source material to what ended up in the final.”

Comping multiple takes and retiming

On Air Head, the team did not comp multiple takes together. For example, the shots of the balloon drifting over the motor racing were all generated in the one shot fairly much as seen. However, they are working on a new film that mixes and composites multiple takes into one clip.

Interestingly, many of the Air Head clips were generated as if shot in slow motion, while this was not requested in the prompt. This happened for unknown reasons, and so many of the clips had to be retimed to appear to have been shot in real-time. Clearly, this is easier to do than the reverse of slowing down rapid motion, but still, it seems like an odd aspect to have been inferred from the training data. “I don’t know why, but it does seem like a lot of clips at 50 to 75% speed,” he adds. “So there was quite a bit of adjusting timing to keep it all from feeling like a big slowmo project.”

Lighting and grading

Shy Kids used the term ‘35 mm film‘ in their prompts as a keyword and generally found that the prompt 35mm gave a level of consistency that they sought. “If we needed a high contrast, we could say high contrast, and say key lighting would generally give us something that was close,” says Patrick. “We still had to take it through a full color grade, and we did our own digital filmic look, where we applied grain and flicker to just sort of meld it all together.” There is no option for additional passes such as mattes or depth passes.

Copyright

OpenAI is trying to be respectful and not allow material to be generated that violates copyright or produces images that would appear to be from someone they are not. For example, if you prompt something such as 35mm film in a futuristic spaceship, a man walks forward with a light sword,  SORA will not allow the clip to be generated as it is too close to Star Wars. But the Shy Kids accidentally bumped into this during early testing. Patrick recalls that when they initially sat down and just wanted to test SORA, “We had that one shot behind the character’s back; it’s kind of that Aronofsky following shot. And I think it was just my dumb brain, as I was tired, but I put ‘Aronofsky type shot’ in and got hit with a can’t do that.,” he recalls. Hitchcock Zoom was another thing that came up as something that is  now by osmosis, a technical term, but SORA would reject the prompt for copyright purposes.

Sound

Shy Kids are known for their audio skills in addition to their visual skills. The music in the short film is their own. “It was a song we had in the back catalogue that we almost immediately decided on because the song’s called The Wind, ” says Patrick. “We all just liked it.”

Patrick himself is the voice of Sonny. “Sometimes we’d feel pacing-wise the film needed another beat. So I would write another line, record it, and come up with some more SORA generations, which is another powerful use of the tool in the post: when you’re in a corner, and you need to fill a gap, it’s a great way to start brainstorming and just spit clips out to see what you can use to fill the pacing problem.”

Summary

SORA is remarkable; the Shy Kids team produced Air Head with a team of just 3 people in around 1.5 to 2 weeks. The team is already working on a wonderful, self-aware, and perhaps ironic sequel. “The follow-up is a journalistic approach to Sonny, the balloon guy, and his reaction to fame and subsequent sort falling out with the world,” says Patrick. “And we’re exploring new techniques!” The team is looking to be a bit more technical in their experimentation, incorporating AE composting of SORA elements into real live-action footage and using SORA as a supplementary VFX tool.

SORA is very new, and even the basic framework that OpenAI has sketched out and demonstrated for SORA has yet to be available for early tests to use. It is doubtful that SORA in its current form will be released anytime soon, but it is an incredible advance in a particular type of implicit image generation. For high-end projects, it may be a while before it allows the level of specificity that a director requires. It will be more than ‘close enough’ for many others while delivering stunning imagery. Air Head still needed a large amount of editorial and human direction to produce this engaging and funny story film. “I just feel like people have to SORA as an authentic part of their process; however, if they don’t want to engage with anything like that, that’s fine too.”

Read the whole story
Share this story
Delete

We can have a different web

1 Share
Read the whole story
Share this story
Delete

Bike Lock NYPD Insists Is 'Industrial' Protest Tool Is a Normal Lock Recommended by Columbia University

1 Comment

Advertisement

Columbia University allowed police to storm campus Tuesday night, and Wednesday morning, the NYPD Deputy Commissioner accused student protestors of using an "industrial" chain to barricade doors.

Read the whole story
Share this story
Delete
1 public comment
acdha
2 hours ago
reply
Just a thought but maybe students wouldn’t need 15lbs chains if the police didn’t ignore theft against people they don’t like…
Washington, DC

Commentary: Merchants Are Getting People Killed - Streetsblog San Francisco

1 Share

Note: GJEL Accident Attorneys regularly sponsors coverage on Streetsblog San Francisco and Streetsblog California. Unless noted in the story, GJEL Accident Attorneys is not consulted for the content or editorial direction of the sponsored content.

A couple of days ago in Berkeley, a driver rammed into a cyclist on an extravagantly large, car-oriented commercial corridor and severed the man’s leg. That man, for no good reason, will suffer a potentially life-de-stabilizing disability for the duration of his life. The consequence for the driver will be nothing. The street won't be any safer in the near future as well, because years ago some business owners and a pro-car council member killed a bike lane proposal for the corridor.

A month ago, a reckless, impatient driver in San Francisco lost control and rammed into a mother, father, and their two babies waiting at a bus stop — killing the entire family. In August last year, another driver turning too fast towards a freeway ramp struck a family and killed their baby. This week it was announced that the driver will suffer no consequences, other than symbolic community service and taking a “driver safety class.” Criminal courts will likely impose a similarly weak consequence to the driver that wiped out the family last month.

I don’t think either driver should be thrown in jail — even though evidence suggests their reckless impatience killed people. Certainly, that will haunt them for the rest of their lives. Jail time should be reserved for drivers who intentionally act maliciously. I place most of the blame for these accidents on traffic engineers and city planners who let drivers speed through crowded pedestrian areas.

However, driving is not a right, it’s a privilege and our social contract depends on people responsibly using this privilege. A responsible society would bar reckless drivers from driving ever again in a dense, transit-rich, populated area like San Francisco at the very least. Yet our justice system treats cars as equally essential to people's mobility as oxygen is to the brain, yet with none of the caution and responsibility expected of a car's use. The easiest way to get away with murder in the United States is to kill someone with your car and make it look like a traffic accident.

Every day in America, people like you and I die at the hands of drivers who will never suffer any consequence for their actions. Every day, families are torn apart by drivers killing people. Those who survive blows from cars are instantly disabled and disfigured, lose their jobs and livelihoods, and suffer from severe injuries — physical and psychological — for the rest of their lives. It is a leading cause of death for most age groups that are not seniors.

Cities throughout the country, including Berkeley and San Francisco, passed “Vision Zero,” a plan to eliminate traffic deaths, inspired by the Netherlands’s success in creating a traffic-death-free haven. These reforms have failed nearly everywhere in the United States because Americans are politically unwilling to make the changes necessary to make alternatives to driving safer. No large city can boast zero traffic deaths and nationwide traffic deaths have stubbornly increased since 2010.

The reasons are multi-faceted: the rise of smartphones distracting drivers, regulatory loopholes allowing oversized trucks with massive blind spots, and the decline of public transit ridership putting more cars on the road. Paris initiated traffic safety reforms that reduced driving to widespread opposition by motorists and merchants. But the mayor persisted and transformed Paris into a walking, cycling, and transit haven. Paris's transformation is broadly popular and the envy of many in the United States. Unfortunately, no American city has the guts to commit to what Paris did and I'm going to be frank as to why.

The number one obstacle to any safety improvements is local merchants. Business owners and the merchant class believe that any customers they get are drivers. They are unswayed by research consistently showing that increased foot traffic and alternative travel to commercial areas increase their profit. Part of this is because merchants are just as car-brained as the general population. But the other half is that merchants disproportionately listen to their patrons who drive and complain about parking. Transit riders, cyclists and pedestrians don’t advertise to merchants that they didn't arrive by car.

Though small in number, the elected interests of most local cities give disproportionate attention to business interests and their pro-driving beliefs. Even in progressive Berkeley, home of many climate scientists from the university, transportation decisions are dictated by science illiterates and business interests, not the city’s intellectuals. When Berkeley proposed building a bike lane in my neighborhood, which has no protected bike lanes near a prominent middle school, many locals went uncharacteristically nuts. Plastered on neighborhood businesses were conspiracy theories about a United Nations agenda to force people into plastic cities where they won't be allowed to own cars. Every other lawn has signs proclaiming economic ruin if drivers are forced to park a whopping 30 seconds away on side streets rather than directly in front of businesses.

Despite the town being highly educated, many Berkeleyans simply closed their ears to modern climate science and empirical evidence on transportation. A writer for The New York Times, one of many residing in Berkeley, privately remarked to me how astonishing it was to witness such a sophisticated population reacting like simpletons to the most modest safety improvements that are commonplace throughout the world.

Sadly, history is repeating itself in San Francisco. Business interests in the West Portal neighborhood where the family was wiped out by a car are already organizing to stop any improvements to the street. This is a major transit hub in S.F., developed before cars were even in mass use, yet the jurisdiction of drivers knows no bounds. If there can’t be a car-free commercial strip in West Portal, there can’t be one anywhere in America. Some business groups see the death of that family as merely an unavoidable consequence, a price paid to ensure drivers don't have to walk an additional 30 seconds from parking on a side street to reach their shops. I don't think the merchants are bad people, but they are so indoctrinated by decades of car-centrism that they harm their own bottom line. A car-free corridor would increase profits for the local businesses!

Because localities refuse to deal with this car carnage, car insurance has skyrocketed as traffic violence has increased. Insurance companies know people are killing themselves and each other at higher rates on the road. But it's also creating a negative side-effect of increasing the number of uninsured drivers, which is why hit-and-runs keep increasing.

Change takes a long time and if it’s going to start anywhere, it’s local. Hassle your local elected officials; form groups, start petitions; tell your local businesses that patrons besides motorists matter. I personally let my local grocers know that I arrive by bus or foot, and my neighbors now keep their bike helmets on while shopping. Even though they advocate against my safety, I don’t hate them, it's just that they don’t know better.

We can’t keep this status-quo going: 45,000 killed, over 100,000 injured by cars a year. Yet it gets a fraction of the media coverage a Boeing plane part gets. It's truly absurd that transit riders, cyclists, and even pedestrians are guaranteed travel on a very limited number of streets, while drivers are granted virtually every road and cry discrimination if a portion of one is given to someone else.

Darrell Owens is a Berkeley-based data analyst, advocate, and a housing and transit wonk. A version of this story first appears on his substack.

Read the whole story
Share this story
Delete

Conservation groups suggest endangered ocelot population may be expanding | The Texas Tribune

1 Share
Read the whole story
Share this story
Delete

Tesla Lays Off Entire Team Behind Brakes

1 Share

AUSTIN, TX—In the latest round of layoffs for the company’s struggling automotive division, electric vehicle manufacturer Tesla fired the entire team behind brakes, sources confirmed Wednesday. “As we continue to rightsize the Tesla workforce, we have come to the decision that stopping the car is no longer a critical function,” said CEO Elon Musk, whose announcement came as a shock to the team of 500 Tesla workers responsible for the electric vehicles’ braking systems. “As the brakes never really worked anyway, we figured the team’s existence was redundant. Going forward, none of our models will be outfitted with brakes. Instead, we will shift our efforts to making fart noises louder.” At press time, Tesla staffers responsible for wheels were reportedly nervous after receiving an ominous meeting request from HR.

Read the whole story
Share this story
Delete
Next Page of Stories