Reference-to-Video API

reference-to-video は Seedance 2.0 で最も強力なモードです。1 回のリクエストで 最大 9 枚の参照画像 + 3 本の参照動画 + 3 つの参照音声クリップ を含めることができ、モデルはそのすべてに導かれて新しい動画を合成します。

代表的なシナリオ：

スタイル参照 — 特定のアートスタイルを定義する数枚の画像。新しい動画はそのスタイルを反映します
キャラクター／プロダクト参照 — 同じ仮想キャラクターやプロダクトを新しいシーンやアクションで登場させ続けます
撮影手法の参照 — 望むカメラ運びとモーションを伝えるデモ動画
音楽駆動のテンポ — 視覚的なリズムとムードを駆動する参照音声クリップ
動画編集／延長 — 既存の素材を継続、延長、書き換えします

エンドポイント

POST https://api.evolink.ai/v1/videos/generations

モデル ID: seedance-2.0-reference-to-video

Fast バリアントは seedance-2.0-fast-reference-to-video です。パラメータ構造は同一です。

リクエストパラメータ

パラメータ	型	必須	デフォルト	説明
`model`	string	はい	—	`seedance-2.0-reference-to-video` を指定
`prompt`	string	はい	—	動画の説明。各参照素材の用途を自然言語で記述します（例: "use video 1's first-person perspective, audio 1 as background music throughout"）。中国語 500 文字以下または英語 1000 ワード以下
`image_urls`	array<string>	いいえ	—	参照画像 URL を 0〜9 個
`video_urls`	array<string>	いいえ	—	参照動画 URL を 0〜3 個
`audio_urls`	array<string>	いいえ	—	参照音声 URL を 0〜3 個
`duration`	integer	いいえ	`5`	動画の長さ（秒）、`4`〜`15`
`quality`	string	いいえ	`720p`	`480p` または `720p`
`aspect_ratio`	string	いいえ	`16:9`	`16:9`、`9:16`、`1:1`、`4:3`、`3:4`、`21:9`、`adaptive`
`generate_audio`	boolean	いいえ	`true`	同期音声を生成するか
`callback_url`	string	いいえ	—	タスク完了コールバック用の HTTPS URL

重要な制約: image_urls、video_urls、audio_urls はすべて空でも構いません（純粋な text-to-video と等価）。ただし audio_urls のみを指定することは許可されません。音声を提供する場合は、ビジュアルアンカーとして少なくとも 1 枚の画像か 1 本の動画も併せて提供する必要があります。

プロンプトでの役割割り当て

このモデルには タグ構文がありません（@Image1、@Video1 のようなタグは存在しません）。各素材の役割は 自然言語 で割り当てます。モデルは配列順序に基づき「image 1 / video 1 / audio 1」のような参照を理解します。

よくあるパターン：

意図	推奨プロンプト表現
image 1 を最初のフレームとして使用	"Use image 1 as the first frame of the video"
video 1 にカメラ運びを駆動させる	"Replicate video 1's camera movement and pacing"
audio 1 を BGM として使用	"Use audio 1 as background music throughout the entire video"
image 1 のキャラクターを保持	"The character's appearance stays consistent with image 1"
image 2 のスタイルを転写	"The overall art style references image 2's color palette and texture"

これらのパターンは 1 つのプロンプト内で自由に組み合わせられます。素材の順序自体は有効性に影響しませんが、モデルが「image 1 / image 2」をどう解釈するかには影響します。再現性のため順序は安定させてください。

入力素材の制限

画像

制約	上限
枚数	0〜9
フォーマット	`.jpeg`、`.png`、`.webp`
寸法	各辺 300〜6000 px
アスペクト比	0.4〜2.5
1 枚あたりの最大サイズ	30 MB 以下

動画

制約	上限
本数	0〜3
フォーマット	`.mp4`、`.mov`
1 本あたりの長さ	2〜15 秒
合計時間	15 秒以下
解像度	480p〜720p
フレームレート	24〜60 FPS
1 本あたりの最大サイズ	50 MB 以下

音声

制約	上限
個数	0〜3
フォーマット	`.wav`、`.mp3`
1 個あたりの長さ	2〜15 秒
合計時間	15 秒以下
1 個あたりの最大サイズ	15 MB 以下

全体

制約	上限
リクエストボディの合計	64 MB 以下（Base64 インライン埋め込み不可）
最低限必要なコンテンツ	少なくとも 1 枚の画像か 1 本の動画（音声のみは不可）

リクエスト例

cURL — 3 モーダル合成（画像 + 動画 + 音声）

curl -X POST https://api.evolink.ai/v1/videos/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2.0-reference-to-video",
    "prompt": "Replicate video 1's first-person perspective and camera pacing. Use audio 1 as the soundtrack for the entire video. Scene: a young rider weaving through a rain-soaked city street at night, neon reflections on wet asphalt.",
    "image_urls": ["https://example.com/rider-style.jpg"],
    "video_urls": ["https://example.com/pov-reference.mp4"],
    "audio_urls": ["https://example.com/synthwave-bgm.mp3"],
    "duration": 10,
    "quality": "720p",
    "aspect_ratio": "16:9"
  }'

Python — 画像のみ（最大 9 枚）

import requests

response = requests.post(
    "https://api.evolink.ai/v1/videos/generations",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "model": "seedance-2.0-reference-to-video",
        "prompt": "The overall art style references the color palette and texture of the 3 provided images. Scene: a small-town summer market at dusk, warm tones.",
        "image_urls": [
            "https://example.com/style-ref-1.jpg",
            "https://example.com/style-ref-2.jpg",
            "https://example.com/style-ref-3.jpg"
        ],
        "duration": 8,
        "aspect_ratio": "16:9"
    }
)

task = response.json()
print(f"Task ID: {task['id']}")

Node.js — 動画のみの参照（カメラ運びの再現）

const res = await fetch("https://api.evolink.ai/v1/videos/generations", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    model: "seedance-2.0-reference-to-video",
    prompt: "Replicate video 1's orbital camera movement and velocity curve. Subject: a classical sculpture in a museum hall at dusk.",
    video_urls: ["https://example.com/orbit-shot.mp4"],
    duration: 8,
    quality: "720p",
    aspect_ratio: "16:9"
  })
});

const task = await res.json();
console.log("Task ID:", task.id);

レスポンス

{
    "id": "task-unified-1774857405-abc123",
    "object": "video.generation.task",
    "created": 1774857405,
    "model": "seedance-2.0-reference-to-video",
    "status": "pending",
    "progress": 0,
    "type": "video",
    "task_info": {
        "can_cancel": true,
        "estimated_time": 180,
        "video_duration": 10
    },
    "usage": {
        "billing_rule": "per_second",
        "credits_reserved": 60,
        "user_group": "default"
    }
}

課金の注意事項

秒単位課金 で出力動画の duration に基づいて計算されます
参照動画の 入力時間も課金対象 に含まれます（10 秒の参照動画は 10 秒分の入力として課金）
音声生成自体は 追加料金なし です

FAQ

参照素材は出力にそのまま現れますか? いいえ。モデルはそれらをスタイル／構図／モーション／リズムのシグナルとして扱い、最終的な出力は完全に新規生成されたコンテンツです。

参照素材なしでリクエストを送れますか? はい — その場合は純粋な text-to-video のように動作します。ただし参照がないのであれば、より安価な text-to-video を直接使用してください。

素材の順序は重要ですか? はい。プロンプトで「video 1」と書いた場合、モデルはそれを video_urls[0] にマッピングします。順序を安定させることで実験の再現性が高まります。