The quick answer
Extract one frame per second from a video:
Extract just the first frame:
Extract a frame at a specific timestamp:
Those three commands cover maybe 80% of use cases when you need to extract frames from video with FFmpeg. The other 20% is where it gets interesting: keyframe extraction, scene detection, quality tuning, and doing all of it across hundreds of videos without babysitting a terminal.
How FFmpeg frame extraction actually works
Every video is a sequence of compressed frames. Not all frames are created equal, though. There are three types:
I-frames (keyframes): Complete images. These decode on their own.
P-frames: Store only the differences from the previous frame. Cheaper to encode, but FFmpeg needs the prior frame to reconstruct them.
B-frames: Reference both previous and future frames. Most compact, slowest to decode in isolation.
When you ask FFmpeg to extract a frame, it decodes from the nearest keyframe forward until it reaches the target. This has two practical consequences:
Seeking is not instant. Put
-ssbefore-ifor fast (but slightly imprecise) seeking that uses the container's index. Put it after-ifor frame-accurate seeking that decodes everything from the start. The video trimming guide goes deeper on input vs output seeking and when each approach breaks down.Frame count adds up fast. A 60-second video at 30fps has 1,800 frames. A 5-minute video at 30fps produces 9,000. Extract all of them as PNG and you're looking at 10-45GB of disk space. Be deliberate about what you extract.
Extract the first frame
-frames:v 1 tells FFmpeg to stop after one video frame. This is the fastest extraction you can do. FFmpeg decodes just enough to produce one image.
Change the extension to .png for lossless output, or .webp for smaller files with comparable visual quality.
One thing to watch out for: some videos start with a few black frames (intros, fade-ins). If your first frame is just black, skip ahead a couple seconds:
Extract a frame at a specific timestamp
Putting -ss before -i (input seeking) makes FFmpeg jump to the timestamp using the container's index. It's fast but might land on the nearest keyframe rather than the exact frame. For thumbnails and preview images, that's fine. Nobody notices a 1-2 second offset.
If you need frame-exact accuracy:
This is slower because FFmpeg decodes everything from the start up to your timestamp. For a 2-hour video, that means decoding 1 hour and 30 minutes of frames just to get one image. Use this when precision matters (aligning frames to subtitles, syncing to specific audio cues).
You can also combine both for a middle-ground approach:
This seeks to 1:25 at the input level (fast), then decodes 5 more seconds at the output level for accuracy. Same technique described in the trim guide for cutting clips from large files.
Extract frames at regular intervals
One frame per second:
One frame every 5 seconds:
One frame every 30 seconds:
The fps filter resamples the video to your target frame rate. fps=1 means one frame per second. fps=1/5 means one frame every five seconds. The math is straightforward: it's frames per second, so fractions give you longer intervals.
If you want 10 frames per second (useful for creating sprite sheets or smooth previews):
A quick note on the %04d pattern: it's a C-style format string. %04d produces zero-padded 4-digit numbers (0001, 0002, ..., 9999). If you're extracting more than 9,999 frames, bump it to %06d. FFmpeg will silently overwrite frames if the counter wraps.
For more filter examples, the 50-command cheat sheet covers fps, scale, overlay, and other common video filters.
Extract all frames
No filter, no frame limit. FFmpeg extracts every single frame. A 10-second clip at 24fps produces 240 images. A 5-minute video at 30fps produces 9,000.
Only do this for short clips or workflows that genuinely need every frame (frame-by-frame analysis, ML training datasets, rotoscoping). For anything else, use the fps filter to sample at a reasonable rate.
To check how many frames you'll get before running the extraction:
This decodes the entire video (takes a moment for long files) but gives you the exact frame count. Useful before committing to a 50,000-frame extraction.
fps= vs select=: which is faster?
This comes up constantly on Stack Overflow. Both fps= and select= can extract frames at intervals, but they work differently under the hood.
fps= filter resamples the entire video to a new frame rate. FFmpeg decodes all frames, picks the nearest frame to each tick, and outputs it.
select= filter evaluates an expression for each frame and decides whether to output it. Combined with -vsync vfr, it only outputs selected frames.
This extracts every 30th frame. n is the frame number, mod(n,30) is zero for every 30th frame, not() inverts it to select those frames.
Which is faster? For sparse extraction (one frame every few seconds from a long video), select= with -vsync vfr is faster because FFmpeg can skip over frames it doesn't need. The fps= filter decodes everything regardless.
In practice, the difference is negligible for short clips. Both finish a 2-minute video in under a second. Where it matters is longer files: extracting one frame every 10 seconds from a 2-hour 1080p video, select= can finish 2-3x faster than fps= because it's skipping decode work on ~99% of frames.
One gotcha with select=: always add -vsync vfr. Without it, FFmpeg duplicates frames to maintain a constant frame rate output, which defeats the purpose and produces duplicate images.
The FFmpeg commands reference lists both approaches with ready-to-use API examples.
Extract keyframes only
Keyframes (I-frames) are fully encoded frames that don't depend on other frames. They decode instantly because there's no inter-frame prediction to resolve, and they're typically the sharpest frames in the video.
-skip_frame nokey tells the decoder to skip everything except keyframes. This is significantly faster than normal decoding because FFmpeg doesn't process P-frames or B-frames at all.
To extract all keyframes (not just the first 10), drop the -frames:v limit:
Real-world keyframe intervals vary. Most H.264/H.265 videos use a GOP (Group of Pictures) size of 48-250 frames. At 30fps, that translates to a keyframe every 1.6 to 8.3 seconds. A 5-minute video typically yields 35-180 keyframes depending on the encoder settings.
You can check where keyframes fall with ffprobe:
When to use keyframe extraction:
Generating video thumbnails where any representative frame works
Quick visual audit of a video's content without watching it
Building a visual index or storyboard
Pre-filtering before running expensive operations like scene detection
Scene detection frame extraction
FFmpeg's select filter has a scene function that scores each frame for visual change compared to the previous frame. Values range from 0 (identical) to 1 (completely different).
This extracts frames where the scene change score exceeds 0.3. Lower values (0.1-0.2) catch more transitions including slow dissolves. Higher values (0.4-0.5) only trigger on hard cuts.
The right threshold depends on the content:
Talking head videos: 0.3-0.4 catches camera angle changes without grabbing every head movement
Music videos or action sequences: 0.2-0.25 to catch rapid cuts
Slideshows or presentations: 0.15-0.2 catches slide transitions reliably
Surveillance/dashcam footage: 0.4-0.5 to only flag significant changes
You can combine scene detection with a minimum interval to avoid extracting a burst of frames during a montage:
That expression ensures at least 2 seconds between extracted frames, even during rapid scene changes. The expression is ugly, but select filter logic often is. Here's what it does:
isnan(prev_selected_t)is true for the very first frame (no previous selection exists)gte(t-prev_selected_t,2)is true when at least 2 seconds have passed since the last selected frameBoth branches require
gt(scene,0.3), the scene change threshold
JPEG vs PNG vs WebP: choosing the right format
The output format matters more than people think, especially at scale.
PNG is lossless. Every pixel is preserved exactly. A single 1080p frame is typically 2-5MB. Use PNG when you need exact pixel data: computer vision pipelines, ML training, frame-accurate editing.
JPEG is lossy but practical. A 1080p frame runs 50-200KB at default quality. The -q:v flag controls quality on a scale of 1 (best) to 31 (worst):
WebP gives better compression than JPEG at equivalent visual quality. Typical file sizes are 25-35% smaller. Browser support is universal at this point (even Safari since 2020), though some image processing libraries still lag.
For web thumbnails, JPEG at -q:v 2 to -q:v 5 is the practical choice. You get good quality at reasonable file sizes, and literally every system on earth can handle a JPEG. Use WebP if you control the display environment and want smaller files.
Here's a rough comparison for a typical 1080p frame:
| Format | Quality setting | File size | Notes |
| PNG | lossless | 3-5 MB | Pixel-perfect, largest |
| JPEG | -q:v 2 | 150-200 KB | High quality, good default |
| JPEG | -q:v 5 | 80-120 KB | Visually indistinguishable from q:v 2 for thumbnails |
| JPEG | -q:v 15 | 30-50 KB | Noticeable artifacts on close inspection |
| WebP | -quality 80 | 60-90 KB | Comparable to JPEG q:v 3 |
Batch extraction: multiple videos at once
When you need to extract frames from a folder of videos, a shell loop works fine:
This extracts one frame per second from each MP4 in the videos/ directory and saves them in named subdirectories.
For keyframe thumbnails (one representative frame per video):
Grabs a frame at the 3-second mark from each video. The 3-second offset skips black intro frames that many videos start with.
To run extractions in parallel (useful with multi-core machines):
-P 4 runs 4 FFmpeg processes simultaneously. Adjust based on your CPU cores and available memory. Each FFmpeg instance doing 1080p decode uses roughly 200-400MB of RAM.
This approach works for tens or maybe low hundreds of videos on a decent machine. Beyond that, or when extraction is triggered by user uploads in a web app, you want cloud infrastructure. That's where an API comes in.
Extract frames via API
Running FFmpeg locally works until you need to process at scale: hundreds of videos, extraction triggered by uploads, or frames needed across a distributed system. At that point, you're managing FFmpeg installations, worker queues, disk space, and CPU allocation.
RenderIO's FFmpeg API runs your FFmpeg commands in the cloud. Same syntax, no infrastructure to manage. Grab an API key and the examples below work as-is.
Extract one frame at the 5-second mark:
Extract frames at 1fps from a remote video:
Scene detection via API, extracting only visually distinct frames:
The API downloads your input, runs the command, and stores the output files. Poll the command status endpoint or set up a webhook to get notified when it's done. The curl examples guide has more patterns like this.
Extract frames with Python
If you're building a pipeline (handling uploads, generating training data, creating video previews), you probably want Python, not shell scripts. The Python FFmpeg API tutorial covers the full setup. Here's frame extraction specifically:
For batch processing, concurrent.futures lets you process multiple videos in parallel. Each API call runs on its own cloud worker, so you're not bottlenecked by local CPU:
Extract frames with Node.js
Same approach in JavaScript. The Node.js FFmpeg API guide has the full walkthrough.
Common use cases
The most common reason to extract frames is generating video thumbnails. Grab a frame at a fixed offset (3-5 seconds in) or use scene detection to find something visually interesting. Every video platform does this. The e-commerce video processing guide walks through the full pipeline for product video thumbnails.
Sprite sheets are another big one. Extract frames at regular intervals (every 5-10 seconds), stitch them into a grid image with ImageMagick or a canvas library, and you've got the preview thumbnails that show up when you hover over a video timeline. YouTube and Netflix both do this.
For ML training data, you'll want to extract all frames or sample at a high rate in PNG format to preserve pixel data. Scene detection helps here too: it filters out near-duplicate frames that waste training budget and bias your model toward static scenes.
Keyframe extraction is useful for quality inspection. Pull the keyframes, scan them for encoding artifacts or corruption, and you've audited the video without watching the whole thing. The video compression guide pairs well with this for diagnosing post-transcode quality issues.
Content moderation follows a similar pattern: sample frames at intervals, run them through an image classifier. Scene detection catches transitions where content might change, so you're more likely to flag problematic sections.
For e-commerce product videos, keyframe extraction (select=eq(pict_type,I)) tends to produce the sharpest frames for product listing thumbnails. The e-commerce video processing guide covers the full pipeline.
Troubleshooting
Output frames are blank or garbled. You're probably using stream copy (-c copy) with a non-keyframe seek position. Drop -c copy and let FFmpeg decode/re-encode the frame.
Frame count doesn't match expectations. Variable frame rate (VFR) videos report one frame rate in the container metadata but actually vary. Add -vsync vfr to handle VFR correctly, or use ffprobe to check the actual frame count.
Extraction is slow on large files. Use -skip_frame nokey for keyframe-only extraction, or select= with -vsync vfr for sparse sampling. Both skip decode work on frames you don't need.
Duplicate output images. You're missing -vsync vfr with the select= filter. Without it, FFmpeg duplicates selected frames to fill gaps in the constant-rate output.
Output file names collide. Your %d format isn't wide enough. Use %06d instead of %04d if you're extracting more than 9,999 frames.
FAQ
How do I extract every Nth frame from a video?
Use the select filter with the modulo function. To extract every 100th frame:
Replace 100 with whatever interval you want. n is the zero-based frame number.
Can I extract frames from a specific time range only?
Yes. Use -ss for the start time and -t for the duration:
This extracts 1 frame per second, but only from the 30-second window starting at 1:00.
What's the best format for extracted frames?
JPEG at -q:v 2 to -q:v 5 for general use. PNG when you need lossless pixel data (ML, computer vision). WebP for web delivery where file size matters.
How do I extract frames at a specific resolution?
Chain the scale filter with fps:
640:-1 scales to 640px wide and auto-calculates height to maintain aspect ratio.
Does extracting frames reduce video quality?
No. Extraction decodes the video and writes raw frame data to image files. The original video isn't modified. Image quality depends on the output format and compression settings.
Quick reference
| Task | Command |
| First frame | ffmpeg -i input.mp4 -frames:v 1 frame.jpg |
| Frame at timestamp | ffmpeg -ss 00:01:00 -i input.mp4 -frames:v 1 frame.jpg |
| 1 frame/second | ffmpeg -i input.mp4 -vf "fps=1" frames/%04d.jpg |
| Every 5 seconds | ffmpeg -i input.mp4 -vf "fps=1/5" frames/%04d.jpg |
| Every Nth frame | ffmpeg -i input.mp4 -vf "select='not(mod(n\,30))'" -vsync vfr frames/%04d.jpg |
| Keyframes only | ffmpeg -skip_frame nokey -i input.mp4 -vsync vfr frames/%04d.jpg |
| Scene changes | ffmpeg -i input.mp4 -vf "select='gt(scene\,0.3)'" -vsync vfr frames/%04d.jpg |
| High-quality JPEG | ffmpeg -i input.mp4 -vf "fps=1" -q:v 2 frames/%04d.jpg |
| All frames as PNG | ffmpeg -i input.mp4 frames/%04d.png |
| Specific time range | ffmpeg -ss 00:01:00 -i input.mp4 -t 30 -vf "fps=1" frames/%04d.jpg |
| Scaled output | ffmpeg -i input.mp4 -vf "fps=1,scale=640:-1" frames/%04d.jpg |
Whether you're pulling a single thumbnail or processing thousands of videos through a pipeline, the commands above have you covered. If you'd rather skip the server management, the RenderIO API runs the same FFmpeg commands over HTTP.