Fast Models

Every Seedance 2.0 mode ships in two tiers: Standard and Fast. The Fast family trades a small amount of visual quality for faster generation and lower per-second pricing — perfect for rapid iteration, bulk production, and A/B testing.

The Three Fast Models

Fast Model IDCorresponding StandardUse for
seedance-2.0-fast-text-to-videoseedance-2.0-text-to-videoPure text-to-video
seedance-2.0-fast-image-to-videoseedance-2.0-image-to-video1- or 2-image driven
seedance-2.0-fast-reference-to-videoseedance-2.0-reference-to-videoMultimodal composition

All Fast models share the same endpoint and parameter structure:

POST https://api.evolink.ai/v1/videos/generations

Differences from Standard

Identical:

  • Endpoint
  • Request body schema (all parameter names, types, default values)
  • Allowed quality tiers (480p / 720p), duration range (415 seconds), aspect ratios
  • Input asset count and format limits
  • Response schema, task lifecycle, webhook payload format
  • prompt length limit (500 Chinese chars / 1000 English words)

Different:

  • Faster generation
  • Lower per-second pricing
  • Slightly lower quality detail than Standard (usually indistinguishable to the eye)
  • fast-image-to-video auto-detects first-frame vs first-last-frame mode based on image count (1 = first-frame driven, 2 = first-last-frame transition); no extra field needed

A typical production pipeline uses both tiers:

Iterate on prompts / parameters
    ↓  (Fast model — quick & cheap)
    ↓
Identify the prompts and params you're happy with
    ↓  (Swap `model` field, keep everything else)
    ↓
Final delivery
    ↓  (Standard model — highest quality render)

Code-wise, the only change is the model string — no other logic needs to change.

curl -X POST https://api.evolink.ai/v1/videos/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2.0-fast-text-to-video",
    "prompt": "A commercial introducing the latest 2026 electric sports car, highlighting its aerodynamic design and cabin tech.",
    "duration": 6,
    "quality": "720p",
    "aspect_ratio": "16:9",
    "generate_audio": true,
    "model_params": {
      "web_search": true
    }
  }'

model_params.web_search is exclusive to the text-to-video family (including Fast). It's only billed when a search is actually triggered.

Example: Fast Image-to-Video (auto first-frame / first-last-frame)

import requests

# 1 image → first-frame driven
response = requests.post(
    "https://api.evolink.ai/v1/videos/generations",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "model": "seedance-2.0-fast-image-to-video",
        "prompt": "Camera slowly pushes in, the scene comes alive",
        "image_urls": ["https://example.com/scene.jpg"],
        "duration": 5
    }
)

# 2 images → auto switches to first-last-frame transition
response = requests.post(
    "https://api.evolink.ai/v1/videos/generations",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "model": "seedance-2.0-fast-image-to-video",
        "prompt": "A smooth transition between two scenes",
        "image_urls": [
            "https://example.com/first.jpg",
            "https://example.com/last.jpg"
        ],
        "duration": 6
    }
)

Example: Fast Reference-to-Video

curl -X POST https://api.evolink.ai/v1/videos/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2.0-fast-reference-to-video",
    "prompt": "Replicate video 1's first-person perspective. Use audio 1 as background music throughout. Promo video opening.",
    "image_urls": ["https://example.com/ref1.jpg"],
    "video_urls": ["https://example.com/reference.mp4"],
    "audio_urls": ["https://example.com/bgm.mp3"],
    "duration": 10,
    "quality": "720p",
    "aspect_ratio": "16:9"
  }'

When Not to Use Fast

  • Final ad deliverables / brand hero films — choose Standard for more stable detail
  • Close-up face or micro-expression shots — Standard is more precise
  • Complex reference-to-video composition with 9 images + 3 videos + 3 audio clips — Standard comprehends the combined signals better