Image-to-Video API

Turn 1 or 2 images into a video. The behavior is determined by how many images you pass:

  • 1 imageFirst-frame mode. The image becomes the first frame of the video; the model generates forward motion from it.
  • 2 imagesFirst-last-frame mode. The first image opens the video and the second image closes it; the model generates the transition animation between them.

Endpoint

POST https://api.evolink.ai/v1/videos/generations

Model ID: seedance-2.0-image-to-video

The Fast variant seedance-2.0-fast-image-to-video auto-detects first-frame vs first-last-frame mode based on the image count.

Request Parameters

ParameterTypeRequiredDefaultDescription
modelstringYesMust be seedance-2.0-image-to-video
promptstringYesNatural-language description of motion / camera / atmosphere. ≤ 500 Chinese chars or ≤ 1000 English words
image_urlsarray<string>Yes1 or 2 publicly accessible image URLs
durationintegerNo5Video duration in seconds, range 415
qualitystringNo720p480p or 720p
aspect_ratiostringNo16:916:9, 9:16, 1:1, 4:3, 3:4, 21:9, adaptive
generate_audiobooleanNotrueWhether to generate synchronized audio
callback_urlstringNoHTTPS URL for task completion callback, max 2048 characters

Note: Images are passed as URLs only — Base64 inlining is not supported. URLs must be publicly GET-able without authentication and must not redirect to login pages.

Image Input Requirements

ConstraintLimit
Count1 or 2 images
Format.jpeg, .png, .webp
Dimensions300–6000 px per side
Aspect ratio0.4 – 2.5 (i.e. 2:5 to 5:2)
Max size per image30 MB
Total request body≤ 64 MB

Any request exceeding these limits returns invalid_request. Realistic human faces are not supported — the system rejects them automatically.

First-Frame Mode (1 image)

curl -X POST https://api.evolink.ai/v1/videos/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2.0-image-to-video",
    "prompt": "The camera slowly pushes in and the scene comes alive, with wind gently moving the grass in the background.",
    "image_urls": ["https://example.com/first-frame.jpg"],
    "duration": 5,
    "aspect_ratio": "adaptive"
  }'

aspect_ratio: "adaptive" automatically matches the output's aspect ratio to the input image.

First-Last-Frame Mode (2 images)

curl -X POST https://api.evolink.ai/v1/videos/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2.0-image-to-video",
    "prompt": "A smooth transition from the sunrise ocean to the sunset ocean in the same location",
    "image_urls": [
      "https://example.com/sunrise.jpg",
      "https://example.com/sunset.jpg"
    ],
    "duration": 6,
    "quality": "720p",
    "aspect_ratio": "16:9"
  }'

Both images should have similar dimensions and aspect ratios — otherwise the model may produce distortion during the transition.

Python Example

import requests

response = requests.post(
    "https://api.evolink.ai/v1/videos/generations",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "model": "seedance-2.0-image-to-video",
        "prompt": "The model slowly turns, hair flowing gently in the wind",
        "image_urls": ["https://example.com/portrait.jpg"],
        "duration": 5,
        "quality": "720p"
    }
)

task = response.json()
print(f"Task ID: {task['id']}")

Response

{
    "id": "task-unified-1774857405-abc123",
    "object": "video.generation.task",
    "created": 1774857405,
    "model": "seedance-2.0-image-to-video",
    "status": "pending",
    "progress": 0,
    "type": "video",
    "task_info": {
        "can_cancel": true,
        "estimated_time": 165,
        "video_duration": 5
    },
    "usage": {
        "billing_rule": "per_second",
        "credits_reserved": 50,
        "user_group": "default"
    }
}

Field semantics are identical to other Seedance 2.0 models — see Async Tasks for the full lifecycle.

FAQ

What happens if I pass 3 images? Returns invalid_request. Image-to-video strictly requires 1 or 2 images. If you need more than 2 images as style or subject references, use Reference-to-Video.

Do the image URLs have to be self-hosted? Not required. Any publicly GET-able URL works. For production pipelines that need reproducibility, host images on your own object storage (OSS / S3 / R2) to avoid third-party URL expiration.

Will the output preserve human faces from the input? If the input image contains a realistic human face, the request is rejected outright. For face-consistent virtual characters, synthesize non-realistic faces with another tool first, then feed them to this API.