Seedance 2.0 and the multi-input moment for video
Nine reference images. Three reference clips. Three audio tracks. We've been trying to break Seedance 2.0 for a week — here's where it shines and where it still trips.
- video
- models
- release-notes
Most video models take a prompt. Seedance 2.0, which ByteDance shipped in early access in March and made generally available this week, takes something closer to a director’s brief: up to nine reference images, three reference clips, three audio tracks, and a prompt. We’ve been beating on it for a week. The capability is real, the rough edges are also real, and the use cases where it wins are not the ones we expected.
Why multi-input is a category, not a feature
The thing video models have been bad at — across every provider — is consistency. Generate three clips of “a woman walking through a coffee shop” and you get three different women in three different coffee shops. For storyboarding, brand work, music videos, episodic content, that’s the actual blocker. The visuals look fine in isolation; they stop being useful the moment you need scene B to look like it belongs in the same world as scene A.
Multi-input fixes this by letting you say “here’s what the character looks like (image 1), here’s the room (image 2), here’s the mood we’re going for (clips 1 and 2), here’s the music (audio 1).” Seedance 2.0 isn’t the first to attempt this — Pika had a version, Runway has a beta — but it’s the first one where it works often enough that we’d recommend building a workflow around it.
Where it actually wins
After a week of stress-testing, three categories are clear wins:
Storyboarding for short-form video. Take a brand reference deck, load 4–6 stills into Seedance, and you can generate a sequence of clips that all share visual identity. We were getting roughly 80% character consistency across 8-clip sequences, which is enough to be useful in a real production loop (the bad 20% just gets regenerated).
Music videos and tempo-locked sequences. The audio-track input isn’t for generating sound — Seedance is silent — it’s for pacing. Drop in a 30-second clip of your track and the model cuts the motion to the beats. We tried this on three actual tracks and the result ranged from “okay, useful starting point” to “this is actually shippable with light edits.” Compared to generating clips silently and editing them to music afterward, it’s hours saved.
Brand-consistent product motion. Load a product still and three brand-style references; generate the product rotating, in motion, in context. The product itself stays remarkably stable across generations in a way that prompt-only models can’t match. Whether the context around it looks like your brand is more hit-or-miss, but it’s no longer the bottleneck.
Where it still trips
The model has clear weaknesses we don’t think the launch announcement will tell you about:
- Long-form motion. Anything beyond 10–12 seconds and consistency degrades sharply. The model holds the first few seconds well, drifts in the middle, and recovers only sometimes at the end. For now we’d treat it as a clip generator, not a scene generator.
- Complex audio matching. Pacing to a single instrumental track: works. Pacing to dialogue or a track with multiple distinct sections: hit-and-miss. The model picks up on tempo but struggles with structural cues like a verse-chorus transition.
- Hands and feet. Standard caveat — Seedance is not better than the rest of the field here. Tight hand or foot framing is still where video models break.
How we use it
In the studio, Seedance 2.0 sits in the video picker like the others but with a different default UI: the input panel for it shows reference slots up front, because the model is meant to be fed references. If you only provide a text prompt, you’ll get something acceptable but you’re using the model the wrong way — like using FLUX without specifying a style.
We added a “reference set” pattern to the studio along with this launch: upload a set of stills once, name it (“Northbreak brand kit,” “Maya the character”), and it’s available across video models that support references — not just Seedance. The same reference set can power a Seedance sequence and a Kling continuation in the same project.
Where it fits in the lineup
Seedance 2.0 doesn’t replace anything in our recommendation logic. HappyHorse is still the default for joint audio-video. Veo is still the cinematic hero shot. Grok Imagine is still the social default. Seedance is the brief-driven option — for when you have a clear visual intent backed by reference material and you want consistency to be the headline feature.
Add it to the rotation. Don’t retire anything to make room for it.
Found a typo or want to push back? Email us .