Multimodal Reference

Seedance 2.0 supports a powerful @ tag reference system that lets you assign specific roles to uploaded images, videos, and audio files within your prompt. This gives you fine-grained creative control over the generated video.

@Tag Syntax

Reference uploaded files in your prompt using @ tags that correspond to the position of each URL in its respective array:

Tag FormatMaps ToExamples
@Image1@Image9image_urls[0]image_urls[8]@Image1 as first frame
@Video1@Video3video_urls[0]video_urls[2]replicate @Video1 camera movement
@Audio1@Audio3audio_urls[0]audio_urls[2]@Audio1 for BGM rhythm

Tags are 1-indexed@Image1 refers to the first URL in image_urls, @Image2 to the second, and so on.

File Limits

TypeMax CountSupported FormatsMax SizeDuration
Images9.jpeg, .png, .webp, .bmp, .tiff, .gif30MB each
Videos3.mp4, .mov50MB each2–15s total
Audio3.mp3, .wav15MB each≤ 15s total

Total limit: 12 files across all modalities per request.

Face restriction: Realistic human face uploads are automatically rejected.

Image @Tag Roles

Use image references to control visual elements of the generated video:

RolePrompt PatternDescription
First frame@Image1 as first frameUse the image as the opening frame of the video
Last frame@Image2 as last frameUse the image as the closing frame
Character reference@Image1 as characterMaintain character appearance throughout
Style reference@Image1 as style referenceApply the visual style (colors, mood, aesthetics)
Scene reference@Image1 as sceneUse as background or environment reference
Object reference@Image1 as objectReference a specific object to appear in the video
Composition@Image1 as composition referenceFollow the layout and framing of the image

Video @Tag Roles

Use video references to transfer motion, timing, and camera work:

RolePrompt PatternDescription
Camera movementreplicate @Video1 camera movementCopy the camera trajectory (pan, tilt, zoom, dolly)
Choreographyreplicate @Video1 choreographyMatch body/object motion patterns
Effectsreplicate @Video1 effectsTransfer visual effects and transitions
Rhythmmatch @Video1 rhythmSync cut timing and motion pacing
Full replicationreplicate @Video1Reproduce overall motion, camera, and pacing
Audio extractionuse @Video1 audioExtract and use the audio track from the reference video

Audio @Tag Roles

Use audio references to drive the rhythm and soundtrack of the video:

RolePrompt PatternDescription
Background music@Audio1 for BGM rhythmSync motion energy and cuts to the music beat
Sound effects@Audio1 as sound effectsAlign visual events with audio cues
Beat syncsync to @Audio1 beatMatch motion peaks to musical beats

API Example

A complete multimodal request combining image, video, and audio references:

import requests

response = requests.post(
    "https://api.evolink.ai/v1/videos/generations",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "model": "seedance-2.0",
        "prompt": (
            "@Image1 as first frame, @Image2 as character reference. "
            "Replicate @Video1 camera movement. "
            "Sync to @Audio1 beat. "
            "A cinematic tracking shot through a neon-lit alley at night."
        ),
        "image_urls": [
            "https://example.com/scene-start.jpg",
            "https://example.com/character-ref.jpg"
        ],
        "video_urls": [
            "https://example.com/camera-reference.mp4"
        ],
        "audio_urls": [
            "https://example.com/soundtrack.mp3"
        ],
        "duration": 10,
        "quality": "1080p",
        "aspect_ratio": "16:9"
    }
)

print(response.json())

Common Patterns

Character Consistency

Maintain the same character across different scenes by providing a clear character reference image:

@Image1 as character reference. The woman walks through a busy market, picking up an apple, examining it closely.

Camera Replication

Copy the exact camera trajectory from a reference video onto a completely new scene:

@Image1 as first frame. Replicate @Video1 camera movement. A sweeping drone shot over snow-covered mountains.

Music Video

Sync generated visuals to an audio track's beat and rhythm:

@Image1 as style reference. Sync to @Audio1 beat. Fast cuts of urban street scenes, neon lights, dancing figures.

Rules and Restrictions

  • Tags must match the array position — @Image1 is always image_urls[0]
  • You cannot reference more files than provided in the URL arrays
  • Maximum 12 files total across all modalities
  • Realistic human face images are automatically rejected
  • Video references increase generation cost
  • All URLs must be directly accessible by the server (no authentication, no redirects to login pages)
  • Prompt length limit: 2000 tokens including @ tag text