FFmpeg Extract Frames from Video: Every Method

The quick answer

Extract one frame per second from a video:

ffmpeg -i input.mp4 -vf "fps=1" frame_%04d.png

Extract just the first frame:

ffmpeg -i input.mp4 -frames:v 1 thumbnail.jpg

Extract a frame at a specific timestamp:

ffmpeg -ss 00:00:45 -i input.mp4 -frames:v 1 frame.jpg

Those three commands cover maybe 80% of use cases when you need to extract frames from video with FFmpeg. The other 20% is where it gets interesting: keyframe extraction, scene detection, quality tuning, and doing all of it across hundreds of videos without babysitting a terminal.

How FFmpeg frame extraction actually works

Every video is a sequence of compressed frames. Not all frames are created equal, though. There are three types:

I-frames (keyframes): Complete images. These decode on their own.
P-frames: Store only the differences from the previous frame. Cheaper to encode, but FFmpeg needs the prior frame to reconstruct them.
B-frames: Reference both previous and future frames. Most compact, slowest to decode in isolation.

When you ask FFmpeg to extract a frame, it decodes from the nearest keyframe forward until it reaches the target. This has two practical consequences:

Seeking is not instant. Put -ss before -i for fast (but slightly imprecise) seeking that uses the container's index. Put it after -i for frame-accurate seeking that decodes everything from the start. The video trimming guide goes deeper on input vs output seeking and when each approach breaks down.
Frame count adds up fast. A 60-second video at 30fps has 1,800 frames. A 5-minute video at 30fps produces 9,000. Extract all of them as PNG and you're looking at 10-45GB of disk space. Be deliberate about what you extract.

Extract the first frame

ffmpeg -i input.mp4 -frames:v 1 first_frame.jpg

-frames:v 1 tells FFmpeg to stop after one video frame. This is the fastest extraction you can do. FFmpeg decodes just enough to produce one image.

Change the extension to .png for lossless output, or .webp for smaller files with comparable visual quality.

One thing to watch out for: some videos start with a few black frames (intros, fade-ins). If your first frame is just black, skip ahead a couple seconds:

ffmpeg -ss 00:00:03 -i input.mp4 -frames:v 1 first_frame.jpg

Extract a frame at a specific timestamp

ffmpeg -ss 00:01:30 -i input.mp4 -frames:v 1 frame_90s.jpg

Putting -ss before -i (input seeking) makes FFmpeg jump to the timestamp using the container's index. It's fast but might land on the nearest keyframe rather than the exact frame. For thumbnails and preview images, that's fine. Nobody notices a 1-2 second offset.

If you need frame-exact accuracy:

ffmpeg -i input.mp4 -ss 00:01:30 -frames:v 1 frame_90s.jpg

This is slower because FFmpeg decodes everything from the start up to your timestamp. For a 2-hour video, that means decoding 1 hour and 30 minutes of frames just to get one image. Use this when precision matters (aligning frames to subtitles, syncing to specific audio cues).

You can also combine both for a middle-ground approach:

ffmpeg -ss 00:01:25 -i input.mp4 -ss 00:00:05 -frames:v 1 frame_90s.jpg

This seeks to 1:25 at the input level (fast), then decodes 5 more seconds at the output level for accuracy. Same technique described in the trim guide for cutting clips from large files.

Extract frames at regular intervals

One frame per second:

ffmpeg -i input.mp4 -vf "fps=1" frames/frame_%04d.png

One frame every 5 seconds:

ffmpeg -i input.mp4 -vf "fps=1/5" frames/frame_%04d.png

One frame every 30 seconds:

ffmpeg -i input.mp4 -vf "fps=1/30" frames/frame_%04d.png

The fps filter resamples the video to your target frame rate. fps=1 means one frame per second. fps=1/5 means one frame every five seconds. The math is straightforward: it's frames per second, so fractions give you longer intervals.

If you want 10 frames per second (useful for creating sprite sheets or smooth previews):

ffmpeg -i input.mp4 -vf "fps=10" frames/frame_%04d.png

A quick note on the %04d pattern: it's a C-style format string. %04d produces zero-padded 4-digit numbers (0001, 0002, ..., 9999). If you're extracting more than 9,999 frames, bump it to %06d. FFmpeg will silently overwrite frames if the counter wraps.

For more filter examples, the 50-command cheat sheet covers fps, scale, overlay, and other common video filters.

Extract all frames

ffmpeg -i input.mp4 frames/frame_%04d.png

No filter, no frame limit. FFmpeg extracts every single frame. A 10-second clip at 24fps produces 240 images. A 5-minute video at 30fps produces 9,000.

Only do this for short clips or workflows that genuinely need every frame (frame-by-frame analysis, ML training datasets, rotoscoping). For anything else, use the fps filter to sample at a reasonable rate.

To check how many frames you'll get before running the extraction:

ffprobe -v error -count_frames -select_streams v:0 \
  -show_entries stream=nb_read_frames -of csv=p=0 input.mp4

This decodes the entire video (takes a moment for long files) but gives you the exact frame count. Useful before committing to a 50,000-frame extraction. The ffprobe tutorial covers more ways to inspect stream and frame data, including keyframe distribution and packet analysis.

fps= vs select=: which is faster?

This comes up constantly on Stack Overflow. Both fps= and select= can extract frames at intervals, but they work differently under the hood.

fps= filter resamples the entire video to a new frame rate. FFmpeg decodes all frames, picks the nearest frame to each tick, and outputs it.

ffmpeg -i input.mp4 -vf "fps=1" frames/%04d.png

select= filter evaluates an expression for each frame and decides whether to output it. Combined with -vsync vfr, it only outputs selected frames.

ffmpeg -i input.mp4 -vf "select='not(mod(n\,30))'" -vsync vfr frames/%04d.png

This extracts every 30th frame. n is the frame number, mod(n,30) is zero for every 30th frame, not() inverts it to select those frames.

Which is faster? For sparse extraction (one frame every few seconds from a long video), select= with -vsync vfr is faster because FFmpeg can skip over frames it doesn't need. The fps= filter decodes everything regardless.

In practice, the difference is negligible for short clips. Both finish a 2-minute video in under a second. Where it matters is longer files: extracting one frame every 10 seconds from a 2-hour 1080p video, select= can finish 2-3x faster than fps= because it's skipping decode work on ~99% of frames.

One gotcha with select=: always add -vsync vfr. Without it, FFmpeg duplicates frames to maintain a constant frame rate output, which defeats the purpose and produces duplicate images.

The FFmpeg commands reference lists both approaches with ready-to-use API examples.

Extract keyframes only

Keyframes (I-frames) are fully encoded frames that don't depend on other frames. They decode instantly because there's no inter-frame prediction to resolve, and they're typically the sharpest frames in the video.

ffmpeg -skip_frame nokey -i input.mp4 -vsync vfr -frames:v 10 keyframes/kf_%04d.png

-skip_frame nokey tells the decoder to skip everything except keyframes. This is significantly faster than normal decoding because FFmpeg doesn't process P-frames or B-frames at all.

To extract all keyframes (not just the first 10), drop the -frames:v limit:

ffmpeg -skip_frame nokey -i input.mp4 -vsync vfr keyframes/kf_%04d.png

Real-world keyframe intervals vary. Most H.264/H.265 videos use a GOP (Group of Pictures) size of 48-250 frames. At 30fps, that translates to a keyframe every 1.6 to 8.3 seconds. A 5-minute video typically yields 35-180 keyframes depending on the encoder settings.

You can check where keyframes fall with ffprobe:

ffprobe -select_streams v:0 -show_entries frame=pts_time,key_frame \
  -of csv=p=0 input.mp4 | grep ",1$" | head -20

When to use keyframe extraction:

Generating video thumbnails where any representative frame works
Quick visual audit of a video's content without watching it
Building a visual index or storyboard
Pre-filtering before running expensive operations like scene detection

Scene detection frame extraction

FFmpeg's select filter has a scene function that scores each frame for visual change compared to the previous frame. Values range from 0 (identical) to 1 (completely different).

ffmpeg -i input.mp4 -vf "select='gt(scene\,0.3)'" -vsync vfr scenes/scene_%04d.jpg

This extracts frames where the scene change score exceeds 0.3. Lower values (0.1-0.2) catch more transitions including slow dissolves. Higher values (0.4-0.5) only trigger on hard cuts.

The right threshold depends on the content:

Talking head videos: 0.3-0.4 catches camera angle changes without grabbing every head movement
Music videos or action sequences: 0.2-0.25 to catch rapid cuts
Slideshows or presentations: 0.15-0.2 catches slide transitions reliably
Surveillance/dashcam footage: 0.4-0.5 to only flag significant changes

You can combine scene detection with a minimum interval to avoid extracting a burst of frames during a montage:

ffmpeg -i input.mp4 -vf "select='gt(scene\,0.3)*isnan(prev_selected_t)+gte(t-prev_selected_t\,2)*gt(scene\,0.3)'" -vsync vfr scenes/scene_%04d.jpg

That expression ensures at least 2 seconds between extracted frames, even during rapid scene changes. The expression is ugly, but select filter logic often is. Here's what it does:

isnan(prev_selected_t) is true for the very first frame (no previous selection exists)
gte(t-prev_selected_t,2) is true when at least 2 seconds have passed since the last selected frame
Both branches require gt(scene,0.3), the scene change threshold

JPEG vs PNG vs WebP: choosing the right format

The output format matters more than people think, especially at scale.

PNG is lossless. Every pixel is preserved exactly. A single 1080p frame is typically 2-5MB. Use PNG when you need exact pixel data: computer vision pipelines, ML training, frame-accurate editing.

ffmpeg -i input.mp4 -frames:v 1 frame.png

JPEG is lossy but practical. A 1080p frame runs 50-200KB at default quality. The -q:v flag controls quality on a scale of 1 (best) to 31 (worst):

# High quality — good for archiving or when quality matters
ffmpeg -i input.mp4 -frames:v 1 -q:v 2 frame.jpg

# Medium quality — solid for thumbnails and previews
ffmpeg -i input.mp4 -frames:v 1 -q:v 5 frame.jpg

# Low quality — preview grids, visual indexes
ffmpeg -i input.mp4 -frames:v 1 -q:v 15 frame.jpg

WebP gives better compression than JPEG at equivalent visual quality. Typical file sizes are 25-35% smaller. Browser support is universal at this point (even Safari since 2020), though some image processing libraries still lag.

ffmpeg -i input.mp4 -frames:v 1 -quality 80 frame.webp

For web thumbnails, JPEG at -q:v 2 to -q:v 5 is the practical choice. You get good quality at reasonable file sizes, and literally every system on earth can handle a JPEG. Use WebP if you control the display environment and want smaller files.

Here's a rough comparison for a typical 1080p frame:

Format	Quality setting	File size	Notes
PNG	lossless	3-5 MB	Pixel-perfect, largest
JPEG	`-q:v 2`	150-200 KB	High quality, good default
JPEG	`-q:v 5`	80-120 KB	Visually indistinguishable from q:v 2 for thumbnails
JPEG	`-q:v 15`	30-50 KB	Noticeable artifacts on close inspection
WebP	`-quality 80`	60-90 KB	Comparable to JPEG q:v 3

Batch extraction: multiple videos at once

When you need to extract frames from a folder of videos, a shell loop works fine:

mkdir -p frames

for video in videos/*.mp4; do
  name=$(basename "$video" .mp4)
  mkdir -p "frames/$name"
  ffmpeg -i "$video" -vf "fps=1" -q:v 2 "frames/$name/frame_%04d.jpg"
done

This extracts one frame per second from each MP4 in the videos/ directory and saves them in named subdirectories.

For keyframe thumbnails (one representative frame per video):

mkdir -p thumbnails

for video in videos/*.mp4; do
  name=$(basename "$video" .mp4)
  ffmpeg -ss 00:00:03 -i "$video" -frames:v 1 -q:v 2 "thumbnails/$name.jpg"
done

Grabs a frame at the 3-second mark from each video. The 3-second offset skips black intro frames that many videos start with.

To run extractions in parallel (useful with multi-core machines):

find videos/ -name "*.mp4" | xargs -P 4 -I {} bash -c '
  name=$(basename "{}" .mp4)
  mkdir -p "frames/$name"
  ffmpeg -i "{}" -vf "fps=1" -q:v 2 "frames/$name/frame_%04d.jpg" -loglevel error
'

-P 4 runs 4 FFmpeg processes simultaneously. Adjust based on your CPU cores and available memory. Each FFmpeg instance doing 1080p decode uses roughly 200-400MB of RAM.

This approach works for tens or maybe low hundreds of videos on a decent machine. Beyond that, or when extraction is triggered by user uploads in a web app, you want cloud infrastructure. That's where an API comes in.

Extract frames via API

Running FFmpeg locally works until you need to process at scale: hundreds of videos, extraction triggered by uploads, or frames needed across a distributed system. At that point, you're managing FFmpeg installations, worker queues, disk space, and CPU allocation.

RenderIO's FFmpeg API runs your FFmpeg commands in the cloud. Same syntax, no infrastructure to manage. Grab an API key and the examples below work as-is.

Extract one frame at the 5-second mark:

curl -X POST https://renderio.dev/api/v1/run-ffmpeg-command \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: ffsk_your_api_key" \
  -d '{
    "ffmpeg_command": "-ss 00:00:05 -i {{in_video}} -frames:v 1 -q:v 2 {{out_thumb}}",
    "input_files": {"in_video": "https://your-bucket.s3.amazonaws.com/video.mp4"},
    "output_files": {"out_thumb": "thumbnail.jpg"}
  }'

Extract frames at 1fps from a remote video:

curl -X POST https://renderio.dev/api/v1/run-ffmpeg-command \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: ffsk_your_api_key" \
  -d '{
    "ffmpeg_command": "-i {{in_video}} -vf fps=1 -q:v 2 {{out_frames}}",
    "input_files": {"in_video": "https://your-bucket.s3.amazonaws.com/video.mp4"},
    "output_files": {"out_frames": "frame_%04d.jpg"}
  }'

Scene detection via API, extracting only visually distinct frames:

curl -X POST https://renderio.dev/api/v1/run-ffmpeg-command \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: ffsk_your_api_key" \
  -d '{
    "ffmpeg_command": "-i {{in_video}} -vf \"select=gt(scene\\,0.3)\" -vsync vfr {{out_frames}}",
    "input_files": {"in_video": "https://your-bucket.s3.amazonaws.com/video.mp4"},
    "output_files": {"out_frames": "scene_%04d.jpg"}
  }'

The API downloads your input, runs the command, and stores the output files. Poll the command status endpoint or set up a webhook to get notified when it's done. The curl examples guide has more patterns like this.

Extract frames with Python

If you're building a pipeline (handling uploads, generating training data, creating video previews), you probably want Python, not shell scripts. The Python FFmpeg API tutorial covers the full setup. Here's frame extraction specifically:

import requests
import time

API_KEY = "ffsk_your_api_key"
BASE_URL = "https://renderio.dev/api/v1"
HEADERS = {"Content-Type": "application/json", "X-API-KEY": API_KEY}

def extract_frames(video_url, fps=1, quality=2):
    """Extract frames from a video at the given FPS."""
    response = requests.post(
        f"{BASE_URL}/run-ffmpeg-command",
        headers=HEADERS,
        json={
            "ffmpeg_command": f"-i {{in_video}} -vf fps={fps} -q:v {quality} {{out_frames}}",
            "input_files": {"in_video": video_url},
            "output_files": {"out_frames": "frame_%04d.jpg"}
        }
    )
    command_id = response.json()["command_id"]

    # Poll until complete
    while True:
        status = requests.get(
            f"{BASE_URL}/commands/{command_id}",
            headers=HEADERS
        ).json()
        if status["status"] == "SUCCESS":
            return status["output_files"]
        if status["status"] == "FAILED":
            raise Exception(f"Extraction failed: {status.get('error_message')}")
        time.sleep(2)

# Extract 1 frame per second
output = extract_frames("https://your-bucket.s3.amazonaws.com/video.mp4")
for alias, file_meta in output.items():
    print(f"{alias}: {file_meta['storage_url']}")

# Extract keyframes via scene detection
response = requests.post(
    f"{BASE_URL}/run-ffmpeg-command",
    headers=HEADERS,
    json={
        "ffmpeg_command": '-i {{in_video}} -vf "select=gt(scene\\,0.3)" -vsync vfr {{out_frames}}',
        "input_files": {"in_video": "https://your-bucket.s3.amazonaws.com/video.mp4"},
        "output_files": {"out_frames": "scene_%04d.jpg"}
    }
)

For batch processing, concurrent.futures lets you process multiple videos in parallel. Each API call runs on its own cloud worker, so you're not bottlenecked by local CPU:

from concurrent.futures import ThreadPoolExecutor, as_completed

video_urls = [
    "https://bucket.s3.amazonaws.com/video1.mp4",
    "https://bucket.s3.amazonaws.com/video2.mp4",
    "https://bucket.s3.amazonaws.com/video3.mp4",
]

with ThreadPoolExecutor(max_workers=5) as executor:
    futures = {executor.submit(extract_frames, url): url for url in video_urls}
    for future in as_completed(futures):
        url = futures[future]
        try:
            output = future.result()
            print(f"{url}: {len(output)} output files")
        except Exception as e:
            print(f"{url}: failed — {e}")

Extract frames with Node.js

Same approach in JavaScript. The Node.js FFmpeg API guide has the full walkthrough.

const API_KEY = "ffsk_your_api_key";
const BASE_URL = "https://renderio.dev/api/v1";

async function extractFrames(videoUrl, fps = 1) {
  const response = await fetch(`${BASE_URL}/run-ffmpeg-command`, {
    method: "POST",
    headers: { "Content-Type": "application/json", "X-API-KEY": API_KEY },
    body: JSON.stringify({
      ffmpeg_command: `-i {{in_video}} -vf fps=${fps} -q:v 2 {{out_frames}}`,
      input_files: { in_video: videoUrl },
      output_files: { out_frames: "frame_%04d.jpg" },
    }),
  });
  const { command_id } = await response.json();

  // Poll until done
  while (true) {
    const status = await fetch(`${BASE_URL}/commands/${command_id}`, {
      headers: { "X-API-KEY": API_KEY },
    }).then((r) => r.json());

    if (status.status === "SUCCESS") return status.output_files;
    if (status.status === "FAILED") throw new Error(status.error_message);
    await new Promise((r) => setTimeout(r, 2000));
  }
}

// Usage
const output = await extractFrames(
  "https://bucket.s3.amazonaws.com/video.mp4"
);
for (const [alias, file] of Object.entries(output)) {
  console.log(`${alias}: ${file.storage_url}`);
}

Common use cases

The most common reason to extract frames is generating video thumbnails. Grab a frame at a fixed offset (3-5 seconds in) or use scene detection to find something visually interesting. Every video platform does this. The e-commerce video processing guide walks through the full pipeline for product video thumbnails.

Sprite sheets are another big one. Extract frames at regular intervals (every 5-10 seconds), stitch them into a grid image with ImageMagick or a canvas library, and you've got the preview thumbnails that show up when you hover over a video timeline. YouTube and Netflix both do this.

For ML training data, you'll want to extract all frames or sample at a high rate in PNG format to preserve pixel data. Scene detection helps here too: it filters out near-duplicate frames that waste training budget and bias your model toward static scenes.

Keyframe extraction is useful for quality inspection. Pull the keyframes, scan them for encoding artifacts or corruption, and you've audited the video without watching the whole thing. The video compression guide pairs well with this for diagnosing post-transcode quality issues.

Content moderation follows a similar pattern: sample frames at intervals, run them through an image classifier. Scene detection catches transitions where content might change, so you're more likely to flag problematic sections.

For e-commerce product videos, keyframe extraction (select=eq(pict_type,I)) tends to produce the sharpest frames for product listing thumbnails. The e-commerce video processing guide covers the full pipeline.

Troubleshooting

Output frames are blank or garbled. You're probably using stream copy (-c copy) with a non-keyframe seek position. Drop -c copy and let FFmpeg decode/re-encode the frame.

Frame count doesn't match expectations. Variable frame rate (VFR) videos report one frame rate in the container metadata but actually vary. Add -vsync vfr to handle VFR correctly, or use ffprobe to check the actual frame count.

Extraction is slow on large files. Use -skip_frame nokey for keyframe-only extraction, or select= with -vsync vfr for sparse sampling. Both skip decode work on frames you don't need.

Duplicate output images. You're missing -vsync vfr with the select= filter. Without it, FFmpeg duplicates selected frames to fill gaps in the constant-rate output.

Output file names collide. Your %d format isn't wide enough. Use %06d instead of %04d if you're extracting more than 9,999 frames.

FAQ

How do I extract every Nth frame from a video?

Use the select filter with the modulo function. To extract every 100th frame:

ffmpeg -i input.mp4 -vf "select='not(mod(n\,100))'" -vsync vfr frames/%04d.jpg

Replace 100 with whatever interval you want. n is the zero-based frame number.

Can I extract frames from a specific time range only?

Yes. Use -ss for the start time and -t for the duration:

ffmpeg -ss 00:01:00 -i input.mp4 -t 00:00:30 -vf "fps=1" frames/%04d.jpg

This extracts 1 frame per second, but only from the 30-second window starting at 1:00.

What's the best format for extracted frames?

JPEG at -q:v 2 to -q:v 5 for general use. PNG when you need lossless pixel data (ML, computer vision). WebP for web delivery where file size matters.

How do I extract frames at a specific resolution?

Chain the scale filter with fps:

ffmpeg -i input.mp4 -vf "fps=1,scale=640:-1" -q:v 2 frames/%04d.jpg

640:-1 scales to 640px wide and auto-calculates height to maintain aspect ratio.

Does extracting frames reduce video quality?

No. Extraction decodes the video and writes raw frame data to image files. The original video isn't modified. Image quality depends on the output format and compression settings.

Quick reference

Task	Command
First frame	`-i input.mp4 -frames:v 1 frame.jpg`
Frame at timestamp	`ffmpeg -ss 00:01:00 -i input.mp4 -frames:v 1 frame.jpg`
1 frame/second	`-i input.mp4 -vf "fps=1" frames/%04d.jpg`
Every 5 seconds	`-i input.mp4 -vf "fps=1/5" frames/%04d.jpg`
Every Nth frame	`-i input.mp4 -vf "select='not(mod(n\,30))'" -vsync vfr frames/%04d.jpg`
Keyframes only	`ffmpeg -skip_frame nokey -i input.mp4 -vsync vfr frames/%04d.jpg`
Scene changes	`-i input.mp4 -vf "select='gt(scene\,0.3)'" -vsync vfr frames/%04d.jpg`
High-quality JPEG	`-i input.mp4 -vf "fps=1" -q:v 2 frames/%04d.jpg`
All frames as PNG	`-i input.mp4 frames/%04d.png`
Specific time range	`ffmpeg -ss 00:01:00 -i input.mp4 -t 30 -vf "fps=1" frames/%04d.jpg`
Scaled output	`-i input.mp4 -vf "fps=1,scale=640:-1" frames/%04d.jpg`

Whether you're pulling a single thumbnail or processing thousands of videos through a pipeline, the commands above have you covered. For a focused guide on thumbnail extraction specifically — including scene detection, contact sheets, and batch thumbnail generation via API — see the FFmpeg thumbnail guide. If you'd rather skip the server management, the RenderIO API runs the same FFmpeg commands over HTTP.