Pull audio from video without touching a terminal
You have video interviews to transcribe. Or podcast episodes recorded as video. Or a music library trapped in MP4 files. You need the audio track extracted.
FFmpeg does this in one command. n8n can trigger that command automatically whenever a new video appears. No manual steps. No terminal. No server.
The problem: n8n can't extract audio natively
n8n doesn't have an audio extraction node. The cloud version doesn't allow shell commands. Even self-hosted, running FFmpeg inside n8n blocks the worker and risks crashes on large files.
The solution: send the extraction command to RenderIO's API via n8n's HTTP Request node. RenderIO runs FFmpeg in an isolated container. Your n8n instance stays responsive.
Use the RenderIO n8n node
RenderIO has a partner-verified community node on the n8n marketplace. Install from Settings → Community Nodes → search "renderio". It provides a visual interface for FFmpeg commands, including audio extraction.
The node handles authentication and request formatting automatically. The extraction examples below use HTTP Request nodes for full flexibility, but the same FFmpeg commands work with the native node.
Basic extraction: MP4 to MP3
The simplest workflow: video URL in, MP3 URL out.
HTTP Request node configuration:
Method: POST
URL:
https://renderio.dev/api/v1/run-ffmpeg-commandAuthentication: Header Auth (X-API-KEY)
Body:
-vn disables video. -q:a 2 sets MP3 quality (0=best, 9=worst, 2 is high quality at ~190kbps).
Poll for completion, then use the output URL.
Extraction formats
MP3 (most compatible)
Best for: sharing, podcast distribution, general use.
WAV (lossless)
Best for: transcription services (they often prefer WAV), audio editing, archival.
AAC (Apple/streaming)
Best for: Apple devices, streaming platforms, smaller files than MP3 at same quality.
FLAC (lossless compressed)
Best for: archival when you want lossless but smaller than WAV (typically 50-60% of WAV size).
OGG/Opus (web)
Best for: web applications, voice recordings, VoIP.
Complete workflow: Extract and transcribe
Combine audio extraction with a transcription service:
Node 1: Google Drive Trigger Watches a "Videos" folder for new uploads.
Node 2: Extract audio (HTTP Request)
Note: -ar 16000 -ac 1 converts to 16kHz mono. This is the format most transcription APIs prefer. Smaller files, faster uploads, same transcription quality.
Node 3-5: Poll and get result
Standard polling loop.
Node 6: Send to transcription
Batch extraction from a video library
Process an entire folder of videos:
Step 1: Get video list
Use a Code node or fetch from a spreadsheet:
Step 2: Split in Batches (size: 5)
Step 3: Submit extraction for each
Step 4: Poll and collect URLs
Step 5: Write results to spreadsheet
| Video | Audio URL | Status |
| interview1 | https://media.renderio.dev/interview1.mp3 | extracted |
| interview2 | https://media.renderio.dev/interview2.mp3 | extracted |
Audio processing after extraction
Once you have the audio, you can process it further:
Normalize volume:
Trim silence from start/end:
Convert sample rate:
Chain these into your workflow as additional processing steps after extraction.
Error handling
Common extraction failures:
No audio track: Some screen recordings or animations have no audio. FFmpeg returns an error. Handle with an IF node that checks the error message for "does not contain any stream."
Corrupted audio: Add -err_detect ignore_err before -i to attempt extraction despite minor corruption.
Very long videos: Extraction is fast (typically 10-30 seconds regardless of video length) because it only copies/transcodes the audio stream, not the video.
Get started
The Starter plan at $9/mo includes 500 commands -- enough to set up and test your audio extraction workflow.