Seedance 2.0 Multimodal Reference: The Ultimate Guide to @Tags
Master Seedance 2.0's @tag reference system. Learn image, video, and audio tagging syntax with real examples and prompt templates. Start building free.

Most AI video generators take a text prompt and give you whatever they feel like. Seedance 2.0 works differently. You upload images, videos, and audio files, then use @tags to tell the model exactly what each file should do — act as a first frame, define camera movement, set the music tempo, or provide a character reference.
This @tag reference system is what separates Seedance 2.0 from Sora 2, Kling 3.0, and Veo 3.1. None of them offer this level of multimodal control.
This guide covers every @tag type, the syntax rules, file limits, and real prompt examples you can use immediately. If you want to follow along with API calls, get your free EvoLink API key — it takes 30 seconds.
What Is the @Tag Reference System?
Traditional text-to-video is a one-input, one-output process: you write a prompt, the model interprets it however it wants. Seedance 2.0 turns this into a multi-input, directed-output process.
Here's the difference:
| Approach | Input | Control Level | Result |
|---|---|---|---|
| Text-only | "A woman dances on stage" | Low — model decides everything | Random woman, random dance, random stage |
| With @tags | @Image1 (character) + @Video1 (dance reference) + prompt | High — you direct each element | Your specific character performs the exact dance you referenced |
The @tag system works like a film director's shot list. Each uploaded file gets a role assignment through natural language in your prompt:
@Image1 as the first frame— pins the opening visual@Video1 for camera movement reference— copies the cinematography@Audio1 as background music— sets the soundtrack and rhythm
You can combine up to 12 files (9 images + 3 videos + 3 audio clips) in a single generation, each tagged with a specific purpose.
@Tag Syntax Rules — The Complete Reference
Basic Syntax
The format is straightforward: @ + asset type + number.
@Image1, @Image2, @Image3 ... @Image9
@Video1, @Video2, @Video3
@Audio1, @Audio2, @Audio3
In your prompt, you reference these tags and describe their role in natural language:
@Image1 as the first frame, @Image2 as character reference,
reference @Video1's camera movement and tracking shots,
use @Audio1 for background music tempo.
Note: On the Jimeng (即梦) platform, tags use Chinese format:
@图片1,@视频1,@音频1. Through the API, use@Image1,@Video1,@Audio1.
File Limits and Formats
| Asset Type | Max Count | Formats | Size Limit | Notes |
|---|---|---|---|---|
| Images | 9 | JPEG, PNG, WebP, BMP, TIFF, GIF | 30 MB each | Higher resolution = better output |
| Videos | 3 | MP4, MOV | 50 MB each | Total duration: 2–15s, resolution: 480p–720p |
| Audio | 3 | MP3, WAV | 15 MB each | Total duration: ≤ 15s |
| Combined | 12 total | — | — | Mix any combination within limits |
The Two Entry Modes
Seedance 2.0 has two generation modes. Your input determines which one to use:
- First/Last Frame Mode — Upload only a starting image (+ optional ending image) with a text prompt. Simple and fast.
- All-Round Reference Mode — Upload any combination of images, videos, and audio with @tag assignments. This is where the full power lives.
Rule: If you upload any video or audio reference, or more than 2 images, you must use All-Round Reference mode.
Image @Tags — Control Visual Identity
Image references are the most versatile @tag type. A single image can serve many different purposes depending on how you describe it in your prompt.
Reference Types for Images
| Purpose | Prompt Pattern | Example |
|---|---|---|
| First frame | @Image1 as the first frame | Pins the exact opening visual of your video |
| Last frame | @Image2 as the last frame | Defines the ending visual for transitions |
| Character identity | @Image1 is the main character | Maintains face/body consistency throughout |
| Style reference | reference @Image1's art style | Applies painting style, color palette, or visual aesthetic |
| Scene/environment | scene references @Image3 | Sets the location, background, architecture |
| Object reference | the product in @Image1 | Maintains product details for commercials |
| Composition | framing references @Image1 | Copies the camera angle and layout |
Example: Style Transfer with Van Gogh
Prompt:
A young woman with long blonde hair in a blue dress stands on a hilltop,
gazing at a Provençal village at sunset. Entirely rendered in @Image1's
post-impressionist art style — thick impasto brushstrokes, swirling textures,
rich yellows and blues.
Input: One Van Gogh painting as @Image1
Result: The model renders the entire scene in Van Gogh's signature style — not a filter overlay, but genuine style transfer that maintains the brushstroke texture throughout the video.
Video: Style transfer using @Image reference — Van Gogh post-impressionist rendering
Example: Product Commercial
Prompt:
Commercial showcase of the handbag in @Image2.
Side profile references @Image1.
Surface material texture references @Image3.
Display all product details with cinematic camera movement.
Grand orchestral background music.
Input: 3 images — side view, main product photo, material close-up
Result: A polished product video that maintains exact material textures and proportions from your reference images — no AI hallucination on product details.
Multi-Image Character Consistency
When you need the same character across multiple shots, upload several reference images from different angles:
@Image1 and @Image2 define the main character's appearance.
The character walks through @Image3's environment,
wearing the outfit from @Image4.
The more reference images you provide for a character, the more consistent the output. This solves the "face morphing" problem that plagues single-image generation.
Video @Tags — Replicate Camera & Motion
Video references unlock Seedance 2.0's most impressive capability: precise replication of camera work and physical motion. Upload a reference video, and the model copies the exact cinematography, action choreography, or visual effects.
Reference Types for Videos
| Purpose | Prompt Pattern | What Gets Copied |
|---|---|---|
| Camera movement | reference @Video1's camera movement | Pan, tilt, dolly, tracking, zoom patterns |
| Action/choreography | perform the actions from @Video1 | Body movement, dance steps, fight choreography |
| Visual effects | reference @Video1's transition effects | Particle effects, style transitions, VFX |
| Rhythm/pacing | match @Video1's editing rhythm | Cut timing, beat synchronization, tempo |
| Full replication | completely reference @Video1 | Everything — camera, action, effects, pacing |
Example: Cinematic Camera Replication
Prompt:
Reference @Image1's character. He is in @Image2's elevator.
Completely reference @Video1's camera movements and the protagonist's
facial expressions. Hitchcock zoom when the character is frightened,
then several orbiting shots inside the elevator.
The elevator door opens, tracking shot follows him out.
Exterior scene references @Image3.
Input: 3 images (character, elevator interior, exterior scene) + 1 reference video (with desired camera work)
Result: The model reproduces the exact Hitchcock zoom, orbital camera movements, and tracking shots from your reference video — applied to a completely different character and setting.
Camera Techniques You Can Replicate
Seedance 2.0 can reproduce these camera movements from a reference video:
- Hitchcock zoom (dolly zoom / vertigo effect)
- 360° orbit around the subject
- One-shot continuous take (no cuts)
- Mechanical arm multi-angle tracking
- Low-angle hero shots
- Handheld chase camera
- Fish-eye lens distortion
- Push-pull rhythmic movement
Prompt tip: Be specific about which aspect of the reference video to copy. "Reference @Video1's camera movement" is better than just "reference @Video1" — it tells the model to focus on cinematography rather than trying to copy everything. For camera reference examples with complete Python code, see our dedicated camera movement tutorial.
Example: Action Parkour
Video: Dynamic parkour with cinematic tracking shot — generated with camera movement reference
Audio @Tags — Sound Design with References
Seedance 2.0 generates native audio with every video — sound effects, ambient noise, music, and even dialogue. Audio @tags give you control over what it sounds like.
Reference Types for Audio
| Purpose | Prompt Pattern | What Gets Copied |
|---|---|---|
| Background music | use @Audio1 for background music | Musical style, tempo, instruments |
| Sound effects | sound effects reference @Audio1 | Specific sound textures and timing |
| Voice/narration style | narration voice references @Video1 | Vocal tone, speaking pace, accent |
| Beat sync | match @Audio1's rhythm for editing cuts | Music beats drive visual transitions |
Beat Synchronization (Music Video Mode)
One of the most powerful audio features: upload a music track, and the model synchronizes visual cuts and transitions to the beat.
Prompt:
@Image1 through @Image7 as scene references.
Match @Video1's visual rhythm and beat synchronization.
Each image appears on a music beat with dynamic transitions.
Enhance visual impact with dramatic lighting changes on each cut.
Result: The model creates a music-video-style edit where scene transitions, camera movements, and lighting shifts happen precisely on the beat of your reference audio.
Using Video Audio as Reference
You don't need a separate audio file — you can reference the audio track from an uploaded video:
Background music references @Video1's audio.
This is useful when you want to replicate the sound design of an existing video while changing the visuals.
Example: Character Dialogue
Video: AI-generated character dialogue with natural voice acting and ambient café sounds
Seedance 2.0 supports multi-language dialogue generation, including English, Chinese, Spanish, Korean, and more. Write the dialogue directly in your prompt, and the model generates matching lip-sync and voice acting.
Advanced Combinations — Multi-Modal Recipes
The real power of @tags emerges when you combine multiple modalities. Here are three proven recipes for common production scenarios.
Recipe 1: Cinematic Short Film
Goal: Film-quality scene with specific character, camera work, and soundtrack
Files:
- @Image1: Character face/body reference
- @Image2: Environment/location reference
- @Video1: Camera movement reference (e.g., tracking shot from a film)
- @Audio1: Background music track
Prompt:
@Image1's character walks through @Image2's environment.
Camera movement follows @Video1's tracking shot pattern.
Background music uses @Audio1.
Cinematic lighting, shallow depth of field, 24fps film grain.
File allocation: 2 images + 1 video + 1 audio = 4/12 files used
Recipe 2: E-Commerce Product Video
Goal: Professional product showcase from static product photos
Files:
- @Image1: Product main shot
- @Image2: Product side view
- @Image3: Material/texture close-up
- @Video1: Camera movement reference (orbiting product shot)
Prompt:
Commercial showcase of the product in @Image2.
Side profile references @Image1.
Surface material and texture reference @Image3.
Camera movement references @Video1's orbiting rotation.
Studio lighting, reflective dark surface, premium aesthetic.
File allocation: 3 images + 1 video = 4/12 files used
Recipe 3: Multi-Character Animation
Goal: Two characters interacting with choreographed action
Files:
- @Image1, @Image2: Character A (front + side reference)
- @Image3, @Image4: Character B (front + side reference)
- @Image5: Background/scene reference
- @Video1: Action choreography reference
Prompt:
@Image1 and @Image2 define Character A (spear wielder).
@Image3 and @Image4 define Character B (dual swords).
They fight in @Image5's autumn forest, mimicking @Video1's
combat choreography. White dust rises on impact.
Dramatic star-filled night sky.
File allocation: 5 images + 1 video = 6/12 files used
The 12-File Budget: Allocation Strategy
You have 12 slots. Here's how to allocate them for maximum impact:
| Priority | Allocation | Why |
|---|---|---|
| Character identity | 2-3 images per character | More angles = better consistency |
| Camera/motion reference | 1 video | One good reference is enough |
| Scene/environment | 1-2 images | Sets the world |
| Audio/music | 1 audio or video (for its audio track) | Sets the mood |
| Style reference | 1 image (if needed) | Only if you want non-realistic style |
| Reserve | Keep 2-3 slots free | For iteration and additional detail |
Pro tip: Don't use all 12 slots. Start with 4-6 files and add more only if the output needs more precision. Overloading with references can confuse the model.
API Call Example
Here's how a multimodal generation looks through the API:
import requests
response = requests.post(
"https://api.evolink.ai/v1/videos/generations",
headers={"Authorization": "Bearer YOUR_EVOLINK_API_KEY"},
json={
"model": "seedance-2.0",
"prompt": (
"@Image1 as the main character. "
"@Image2 as the environment. "
"Reference @Video1's tracking shot and camera movement. "
"The character walks through a misty forest at dawn. "
"Cinematic lighting, shallow depth of field."
),
"image_urls": [
"https://your-cdn.com/character.jpg",
"https://your-cdn.com/forest.jpg"
],
"video_urls": [
"https://your-cdn.com/tracking-shot.mp4"
],
"duration": 10,
"quality": "1080p",
"generate_audio": true
}
)
task_id = response.json()["id"]
print(f"Generation started: {task_id}")
Poll for the result:
import time
while True:
status = requests.get(
f"https://api.evolink.ai/v1/tasks/{task_id}",
headers={"Authorization": "Bearer YOUR_EVOLINK_API_KEY"}
)
result = status.json()
if result["status"] == "completed":
print(f"Video ready: {result['results'][0]}")
break
elif result["status"] == "failed":
print(f"Error: {result.get('error', 'Unknown error')}")
break
time.sleep(5)
Run this code with your EvoLink API key. Sign up is free — no credit card required.
Common Mistakes & How to Fix Them
Not specifying the @tag's purpose
Bad: @Image1 @Video1 generate a video of a dancer
Good: @Image1 as the dancer's appearance reference. @Video1 for dance choreography and camera movement. Generate the dancer performing on a stage.
The model needs explicit role assignments. Without them, it guesses — and guesses wrong.
Low-resolution input files
If your @Image1 is 480p, your output will look soft. Always use:
- Images: 2K or higher resolution
- Videos: 720p, clean footage without compression artifacts
- Audio: 128kbps+ MP3 or lossless WAV
Trying to use all 12 file slots
More references doesn't mean better output. Start with 3-5 files and add only if needed. Too many conflicting references confuse the model.
Uploading realistic human face photos
Platform limitation: Seedance 2.0 currently does not support uploading images or videos containing realistic human faces. The system will automatically block these uploads. Use illustrated, anime-style, or stylized character references instead.
Mixing up asset numbering
When you upload 3 images and 2 videos, they are numbered independently:
- Images: @Image1, @Image2, @Image3
- Videos: @Video1, @Video2
Don't write @File3 or @Asset5 — use the type-specific numbering.
Setting wrong duration for video extensions
When extending an existing video by 5 seconds, set the generation duration to 5s (the new portion), not the total length. The extension is appended to the original.
FAQ
How many files can I upload in a single generation?
Up to 12 files total: maximum 9 images, 3 videos, and 3 audio clips. Videos must have a combined duration between 2 and 15 seconds. Audio clips can total up to 15 seconds.
Can I use @tags through the API?
Yes. When calling the API, pass image_urls, video_urls, and audio_urls arrays in the JSON request body. Each array contains direct URLs to your reference files. The @tag numbering (@Image1, @Image2...) corresponds to the order of URLs in each array. The prompt text uses the same @tag syntax as the UI.
What happens if I don't assign a role to an @tag?
The model will attempt to infer the purpose based on the file content and your prompt context. However, this is unreliable. Always explicitly state each tag's role — e.g., @Image1 as the first frame rather than just mentioning @Image1 without context.
Can I reference audio from an uploaded video file?
Yes. Use background music references @Video1's audio in your prompt. The model extracts the audio track from the video and uses it as a sound reference without needing a separate audio file.
What image and video formats are supported?
Images: JPEG, PNG, WebP, BMP, TIFF, GIF (max 30 MB each). Videos: MP4, MOV (max 50 MB each, 480p–720p resolution). Audio: MP3, WAV (max 15 MB each).
Start Building with @Tags
The @tag reference system is what makes Seedance 2.0 the most controllable AI video generator available. Instead of describing what you want and hoping for the best, you show the model exactly what you mean — then direct it like a film crew.
The key principles:
- Every @tag needs a role. Don't just upload files — tell the model what each one does.
- Start small, add precision. Begin with 3-4 references. Add more only if the output needs it.
- Be specific about what to copy. "Reference @Video1's camera movement" beats "reference @Video1."
Ready to direct your own AI videos? Start free on EvoLink — one API key for Seedance 2.0 and all major AI video models, with smart routing that saves you 20-70%.
Continue learning:
- Seedance 2.0 Prompt Guide — Master prompt writing fundamentals
Last updated: February 20, 2026 | Written by J, Growth Lead at EvoLink