Reference-to-Video API

Looking for Seedance 2.5? Seedance 2.5 (native 4K · 30s · up to 50 references) was announced June 23, 2026; the public API is expected early July 2026. Official 2.5 model IDs and parameters aren’t published yet — this page documents the Seedance 2.0 API available today. Get API access →.

reference-to-video is Seedance 2.0's most powerful mode. A single request can include up to 9 reference images + 3 reference videos + 3 reference audio clips, and the model composes a new video guided by all of them.

Typical scenarios:

Style reference — a handful of images defining a specific art style; the new video mirrors that style
Character / product reference — keep the same virtual character or product appearing in new scenes and actions
Cinematography reference — a demo video that conveys the camera pacing and motion you want
Music-driven pacing — a reference audio clip that drives the visual rhythm and mood
Video editing / extension — continue, extend, or rewrite existing footage

Endpoint

POST https://api.evolink.ai/v1/videos/generations

Model ID: seedance-2.0-reference-to-video

The Fast variant is seedance-2.0-fast-reference-to-video — same parameter structure.

Request Parameters

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	Must be `seedance-2.0-reference-to-video`
`prompt`	string	Yes	—	Video description. Use natural language to describe what each reference asset is for (e.g. "use video 1's first-person perspective, audio 1 as background music throughout"). ≤ 500 Chinese chars or ≤ 1000 English words
`image_urls`	array<string>	No	—	0–9 reference image URLs
`video_urls`	array<string>	No	—	0–3 reference video URLs
`audio_urls`	array<string>	No	—	0–3 reference audio URLs
`duration`	integer	No	`5`	Video duration in seconds, `4`–`15`
`quality`	string	No	`720p`	`480p` / `720p` / `1080p` / `4k`
`aspect_ratio`	string	No	`16:9`	`16:9`, `9:16`, `1:1`, `4:3`, `3:4`, `21:9`, `adaptive`
`generate_audio`	boolean	No	`true`	Whether to generate synchronized audio
`callback_url`	string	No	—	HTTPS URL for task completion callback

Key constraint: image_urls, video_urls, and audio_urls can all be empty (equivalent to pure text-to-video), but providing only audio_urls is not allowed. Whenever audio is supplied, you must also provide at least one image or one video as a visual anchor.

Using the Prompt to Assign Roles

This model has no tag syntax (there are no @Image1, @Video1, or similar tags). You assign roles to each asset using natural language, and the model understands references like "image 1 / video 1 / audio 1" based on array order.

Common patterns:

Intent	Recommended prompt phrasing
Use image 1 as the first frame	"Use image 1 as the first frame of the video"
Let video 1 drive the camera	"Replicate video 1's camera movement and pacing"
Use audio 1 as BGM	"Use audio 1 as background music throughout the entire video"
Keep character from image 1	"The character's appearance stays consistent with image 1"
Transfer style from image 2	"The overall art style references image 2's color palette and texture"

You can freely combine these patterns in a single prompt. The order of the assets doesn't affect validity, but it does affect how the model interprets "image 1 / image 2" — keep it stable for reproducibility.

Input Asset Limits

Images

Constraint	Limit
Count	0–9
Format	`.jpeg`, `.png`, `.webp`
Dimensions	300–6000 px per side
Aspect ratio	0.4 – 2.5
Max size per image	≤ 30 MB

Videos

Constraint	Limit
Count	0–3
Format	`.mp4`, `.mov`
Per-clip duration	2–15 seconds
Total duration	≤ 15 seconds
Resolution	480p – 4K
Frame rate	24 – 60 FPS
Max size per clip	≤ 50 MB

Audio

Constraint	Limit
Count	0–3
Format	`.wav`, `.mp3`
Per-clip duration	2–15 seconds
Total duration	≤ 15 seconds
Max size per clip	≤ 15 MB

Overall

Constraint	Limit
Total request body	≤ 64 MB (no Base64 inlining)
Minimum content	At least 1 image OR 1 video (audio-only is not permitted)

Request Examples

curl -X POST https://api.evolink.ai/v1/videos/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2.0-reference-to-video",
    "prompt": "Replicate video 1's first-person perspective and camera pacing. Use audio 1 as the soundtrack for the entire video. Scene: a young rider weaving through a rain-soaked city street at night, neon reflections on wet asphalt.",
    "image_urls": ["https://example.com/rider-style.jpg"],
    "video_urls": ["https://example.com/pov-reference.mp4"],
    "audio_urls": ["https://example.com/synthwave-bgm.mp3"],
    "duration": 10,
    "quality": "720p",
    "aspect_ratio": "16:9"
  }'

Python — Images only (up to 9)

import requests

response = requests.post(
    "https://api.evolink.ai/v1/videos/generations",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "model": "seedance-2.0-reference-to-video",
        "prompt": "The overall art style references the color palette and texture of the 3 provided images. Scene: a small-town summer market at dusk, warm tones.",
        "image_urls": [
            "https://example.com/style-ref-1.jpg",
            "https://example.com/style-ref-2.jpg",
            "https://example.com/style-ref-3.jpg"
        ],
        "duration": 8,
        "aspect_ratio": "16:9"
    }
)

task = response.json()
print(f"Task ID: {task['id']}")

Node.js — Video-only reference (camera replication)

const res = await fetch("https://api.evolink.ai/v1/videos/generations", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    model: "seedance-2.0-reference-to-video",
    prompt: "Replicate video 1's orbital camera movement and velocity curve. Subject: a classical sculpture in a museum hall at dusk.",
    video_urls: ["https://example.com/orbit-shot.mp4"],
    duration: 8,
    quality: "720p",
    aspect_ratio: "16:9"
  })
});

const task = await res.json();
console.log("Task ID:", task.id);

Response

{
    "id": "task-unified-1774857405-abc123",
    "object": "video.generation.task",
    "created": 1774857405,
    "model": "seedance-2.0-reference-to-video",
    "status": "pending",
    "progress": 0,
    "type": "video",
    "task_info": {
        "can_cancel": true,
        "estimated_time": 180,
        "video_duration": 10
    },
    "usage": {
        "billing_rule": "per_second",
        "credits_reserved": 60,
        "user_group": "default"
    }
}

Billing Notes

Per-second billing based on the output video's duration
Reference video input duration also counts toward billing (a 10-second reference video bills at 10 seconds of input)
Audio generation itself is free of extra charge

FAQ

Do the reference assets appear directly in the output? No. The model treats them as signals for style / composition / motion / rhythm; the final output is fully generated new content.

Can I send the request without any reference assets? Yes — this acts like pure text-to-video. But if you have no references, use the cheaper text-to-video directly.

Does asset order matter? Yes. If your prompt says "video 1", the model maps that to video_urls[0]. Keeping a stable order makes experiments reproducible.

Models Overview
Text-to-Video API
Image-to-Video API
Fast Models — seedance-2.0-fast-reference-to-video
Async Tasks / Webhooks