Reference-to-Video API
reference-to-video is Seedance 2.0's most powerful mode. A single request can include up to 9 reference images + 3 reference videos + 3 reference audio clips, and the model composes a new video guided by all of them.
Typical scenarios:
- Style reference — a handful of images defining a specific art style; the new video mirrors that style
- Character / product reference — keep the same virtual character or product appearing in new scenes and actions
- Cinematography reference — a demo video that conveys the camera pacing and motion you want
- Music-driven pacing — a reference audio clip that drives the visual rhythm and mood
- Video editing / extension — continue, extend, or rewrite existing footage
Endpoint
POST https://api.evolink.ai/v1/videos/generations
Model ID: seedance-2.0-reference-to-video
The Fast variant is
seedance-2.0-fast-reference-to-video— same parameter structure.
Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | Must be seedance-2.0-reference-to-video |
prompt | string | Yes | — | Video description. Use natural language to describe what each reference asset is for (e.g. "use video 1's first-person perspective, audio 1 as background music throughout"). ≤ 500 Chinese chars or ≤ 1000 English words |
image_urls | array<string> | No | — | 0–9 reference image URLs |
video_urls | array<string> | No | — | 0–3 reference video URLs |
audio_urls | array<string> | No | — | 0–3 reference audio URLs |
duration | integer | No | 5 | Video duration in seconds, 4–15 |
quality | string | No | 720p | 480p or 720p |
aspect_ratio | string | No | 16:9 | 16:9, 9:16, 1:1, 4:3, 3:4, 21:9, adaptive |
generate_audio | boolean | No | true | Whether to generate synchronized audio |
callback_url | string | No | — | HTTPS URL for task completion callback |
Key constraint:
image_urls,video_urls, andaudio_urlscan all be empty (equivalent to pure text-to-video), but providing onlyaudio_urlsis not allowed. Whenever audio is supplied, you must also provide at least one image or one video as a visual anchor.
Using the Prompt to Assign Roles
This model has no tag syntax (there are no @Image1, @Video1, or similar tags). You assign roles to each asset using natural language, and the model understands references like "image 1 / video 1 / audio 1" based on array order.
Common patterns:
| Intent | Recommended prompt phrasing |
|---|---|
| Use image 1 as the first frame | "Use image 1 as the first frame of the video" |
| Let video 1 drive the camera | "Replicate video 1's camera movement and pacing" |
| Use audio 1 as BGM | "Use audio 1 as background music throughout the entire video" |
| Keep character from image 1 | "The character's appearance stays consistent with image 1" |
| Transfer style from image 2 | "The overall art style references image 2's color palette and texture" |
You can freely combine these patterns in a single prompt. The order of the assets doesn't affect validity, but it does affect how the model interprets "image 1 / image 2" — keep it stable for reproducibility.
Input Asset Limits
Images
| Constraint | Limit |
|---|---|
| Count | 0–9 |
| Format | .jpeg, .png, .webp |
| Dimensions | 300–6000 px per side |
| Aspect ratio | 0.4 – 2.5 |
| Max size per image | ≤ 30 MB |
Videos
| Constraint | Limit |
|---|---|
| Count | 0–3 |
| Format | .mp4, .mov |
| Per-clip duration | 2–15 seconds |
| Total duration | ≤ 15 seconds |
| Resolution | 480p – 720p |
| Frame rate | 24 – 60 FPS |
| Max size per clip | ≤ 50 MB |
Audio
| Constraint | Limit |
|---|---|
| Count | 0–3 |
| Format | .wav, .mp3 |
| Per-clip duration | 2–15 seconds |
| Total duration | ≤ 15 seconds |
| Max size per clip | ≤ 15 MB |
Overall
| Constraint | Limit |
|---|---|
| Total request body | ≤ 64 MB (no Base64 inlining) |
| Minimum content | At least 1 image OR 1 video (audio-only is not permitted) |
Request Examples
cURL — Three-modal composition (image + video + audio)
curl -X POST https://api.evolink.ai/v1/videos/generations \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2.0-reference-to-video",
"prompt": "Replicate video 1's first-person perspective and camera pacing. Use audio 1 as the soundtrack for the entire video. Scene: a young rider weaving through a rain-soaked city street at night, neon reflections on wet asphalt.",
"image_urls": ["https://example.com/rider-style.jpg"],
"video_urls": ["https://example.com/pov-reference.mp4"],
"audio_urls": ["https://example.com/synthwave-bgm.mp3"],
"duration": 10,
"quality": "720p",
"aspect_ratio": "16:9"
}'
Python — Images only (up to 9)
import requests
response = requests.post(
"https://api.evolink.ai/v1/videos/generations",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
},
json={
"model": "seedance-2.0-reference-to-video",
"prompt": "The overall art style references the color palette and texture of the 3 provided images. Scene: a small-town summer market at dusk, warm tones.",
"image_urls": [
"https://example.com/style-ref-1.jpg",
"https://example.com/style-ref-2.jpg",
"https://example.com/style-ref-3.jpg"
],
"duration": 8,
"aspect_ratio": "16:9"
}
)
task = response.json()
print(f"Task ID: {task['id']}")
Node.js — Video-only reference (camera replication)
const res = await fetch("https://api.evolink.ai/v1/videos/generations", {
method: "POST",
headers: {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "seedance-2.0-reference-to-video",
prompt: "Replicate video 1's orbital camera movement and velocity curve. Subject: a classical sculpture in a museum hall at dusk.",
video_urls: ["https://example.com/orbit-shot.mp4"],
duration: 8,
quality: "720p",
aspect_ratio: "16:9"
})
});
const task = await res.json();
console.log("Task ID:", task.id);
Response
{
"id": "task-unified-1774857405-abc123",
"object": "video.generation.task",
"created": 1774857405,
"model": "seedance-2.0-reference-to-video",
"status": "pending",
"progress": 0,
"type": "video",
"task_info": {
"can_cancel": true,
"estimated_time": 180,
"video_duration": 10
},
"usage": {
"billing_rule": "per_second",
"credits_reserved": 60,
"user_group": "default"
}
}
Billing Notes
- Per-second billing based on the output video's
duration - Reference video input duration also counts toward billing (a 10-second reference video bills at 10 seconds of input)
- Audio generation itself is free of extra charge
FAQ
Do the reference assets appear directly in the output? No. The model treats them as signals for style / composition / motion / rhythm; the final output is fully generated new content.
Can I send the request without any reference assets? Yes — this acts like pure text-to-video. But if you have no references, use the cheaper text-to-video directly.
Does asset order matter?
Yes. If your prompt says "video 1", the model maps that to video_urls[0]. Keeping a stable order makes experiments reproducible.
Related
- Models Overview
- Text-to-Video API
- Image-to-Video API
- Fast Models —
seedance-2.0-fast-reference-to-video - Async Tasks / Webhooks