Captions to Video: turn subtitle lines into a branded marketing video

Paste the body of an SRT or VTT file, or any block of caption lines, and ngram reads each subtitle as a beat, plans a scene per line, and renders a branded video with the captions burned back in.

Input · Captions to VideoReady
chars 0 / 4000

Trusted by teams at

Amazon
Amazon
Google
Google
Microsoft
Microsoft
Nvidia
Nvidia
Apple
Apple
Walmart
Walmart
Salesforce
Salesforce
Reddit
Reddit
CVS Health
CVS Health
PayPal
PayPal
John Deere
John Deere
Snap Inc.
Snap Inc.
Amazon
Amazon
Google
Google
Microsoft
Microsoft
Nvidia
Nvidia
Apple
Apple
Walmart
Walmart
Salesforce
Salesforce
Reddit
Reddit
CVS Health
CVS Health
PayPal
PayPal
John Deere
John Deere
Snap Inc.
Snap Inc.
Veeva Systems
Veeva Systems
DocuSign
DocuSign
DP World
DP World
Genpact
Genpact
Parker Hannifin
Parker Hannifin
Bio-Rad
Bio-Rad
Imperva
Imperva
ITV
ITV
HubSpot
HubSpot
Rocket Mortgage
Rocket Mortgage
Tektronix
Tektronix
Diligent
Diligent
Times Internet
Times Internet
Veeva Systems
Veeva Systems
DocuSign
DocuSign
DP World
DP World
Genpact
Genpact
Parker Hannifin
Parker Hannifin
Bio-Rad
Bio-Rad
Imperva
Imperva
ITV
ITV
HubSpot
HubSpot
Rocket Mortgage
Rocket Mortgage
Tektronix
Tektronix
Diligent
Diligent
Times Internet
Times Internet
Deel
Deel
Zapier
Zapier
Delhivery
Delhivery
SafetyCulture
SafetyCulture
Demandbase
Demandbase
PingCAP
PingCAP
Quizizz
Quizizz
Apryse
Apryse
Improvado
Improvado
Taggbox
Taggbox
Matrixport
Matrixport
Glasswall
Glasswall
ContractSafe
ContractSafe
Deel
Deel
Zapier
Zapier
Delhivery
Delhivery
SafetyCulture
SafetyCulture
Demandbase
Demandbase
PingCAP
PingCAP
Quizizz
Quizizz
Apryse
Apryse
Improvado
Improvado
Taggbox
Taggbox
Matrixport
Matrixport
Glasswall
Glasswall
ContractSafe
ContractSafe

How it works

Four steps from a caption file to a video you'd actually publish.

No timeline scrubbing, no manual scene-by-scene cutting, no pasting subtitles into a slide builder. The caption text you already have becomes the script, and the script becomes a storyboard you can edit before anything renders.

01

Paste the caption body or drop a file URL

Paste an SRT or VTT block straight from your editor, or link to a hosted caption file. ngram strips the timestamps and cue numbers, keeps the spoken lines, and stitches fragmented cues back into full sentences.

02

Caption lines become a paced video script

ngram reads the cue text as a running narration, finds the hook, the body, and the close, and groups adjacent lines into scenes instead of one slide per subtitle. Your wording survives where it carries the message.

03

Review the storyboard before render

Each scene shows the caption lines it covers, the visual direction, and the duration. Merge two cues into one beat, drop a filler line, or rewrite a hook in plain language, and the script and scene plan re-flow together.

04

Export with captions burned in

One render produces 16:9 for YouTube and embeds, 1:1 for LinkedIn, and 9:16 for Reels, Shorts, and approved social channels. Captions are restyled in your brand font and burned back into every cut.

Output controls

Smart defaults from the caption text. Real knobs when you need them.

Reads SRT and VTT bodies

Paste the raw file body and ngram parses the cue index, the timestamp range, and the text. Numbered cues, blank-line separators, and WebVTT headers are handled; plain caption lines with no timing convert too.

Cue-line-to-scene mapping

Short cues that belong to one thought get merged into a single scene; a long monologue cue gets split across beats. You see exactly which caption lines map to which scene in the storyboard before render.

Script-first review

Read the full narration assembled from your captions before any visual generates. Cut a line, fix a typo a hand-typed caption left behind, or tighten the open. Each edit re-paces the scenes downstream.

AI Visuals per caption beat

Each scene gets a brand-matched image or short generative clip tied to what that caption line actually says. No keyword-matched stock footage that turns a captioned explainer generic.

Voiceover that reads the captions

A default ngram voice, your cloned founder voice, or a multilingual ElevenLabs voice reads the caption text aloud. The spoken track and the on-screen captions stay in sync line for line.

Caption styling from your Brand Kit

Logo, fonts, colors, motion, intro, and outro pulled from your saved Brand Kit. The burned-in captions use your brand's caption preset, so the 50th captions-to-video looks as on-brand as the first.

Re-render in another language

Translate the caption-driven script and re-burn the captions in the target language without rebuilding the storyboard. One caption source ships a localized cut per market.

Three ratios in one render

16:9 for YouTube and embeds, 1:1 for LinkedIn feed, 9:16 for Reels and approved social channels. Smart reframing keeps the caption text legible across every aspect ratio.

Use cases

Where a captions-to-video earns its spot in the funnel.

LinkedIn distribution

Turn a captioned clip's text into a LinkedIn video post

Already have the captions from a clip you ran? Paste the cue text, render the 1:1 cut with the captions burned back in, and ride the engagement LinkedIn gives native video.

See use case
Explainer video

Caption files become a clean explainer

An old explainer's subtitle export already has the script structure. ngram lifts the cue text into a fresh explainer the support team and the website can both reuse.

See use case
Help center

Support captions become a help video

The subtitle track from a walkthrough recording carries every step in order. Convert those caption lines into a short branded help video customers watch instead of skim.

See use case
Training

Lesson captions become a training video

Caption files from a recorded session hold the full lesson. Paste them in, get a paced training video with brand visuals, and reuse it without re-recording the instructor.

See use case
Social distribution

Repurpose caption lines into short social clips

Pull the strongest cues from a longer caption file and let ngram build a 30-second branded clip around them. Post it without re-editing the original footage.

See use case
Newsletters and email

Embed a caption-driven video in your newsletter

Turn the captions from a clip into an inline video summary so subscribers who never press play on the original still get the message, captions and all.

See use case
Customer onboarding

Onboarding captions become a welcome video

The subtitle export from a setup walkthrough already lists each step. Convert it into a branded onboarding video new customers finish, with the steps captioned on screen.

See use case
Feature announcements

Feature-demo captions become a feature video

Caption text from a quick feature demo carries the whole talk track. Paste it in, get a 60-second feature video with motion graphics and a brand intro from the same lines.

See use case

Tools that pair with this converter

Clean the caption text before. Edit the video after.

All ngram tools

Editing the video further

Take the captions-to-video output past the first cut

Built for teams

Teams who turn caption files into video every week.

All solutions

Integrations

Trigger captions to video where your caption files already land.

Wire the converter into the tools that produce or store your caption exports. Every integration ships with a working captions-to-video recipe you can fork.

Zapier
no-code

whenA new .srt or .vtt caption file lands in your Drive or Dropbox folder

thenConvert the caption body into a 16:9 and 9:16 video with the captions burned back in, and drop both into Drive

Integrate with Zapier
MCP Server
agentic

whenClaude or ChatGPT is handed a block of caption lines to turn into a video

thenPass the cue text to ngram and return the rendered captioned video plus a /watch share link

Use MCP server
n8n
self-host

whenA self-hosted pipeline writes a finished caption file to your store

thenConvert those caption lines into a branded video without the subtitle text leaving your VPC

Integrate with n8n
Make.com
scenarios

whenA media record is updated with an approved caption export

thenAuto-convert that caption text into three social videos and attach them to the campaign record

Integrate with Make
Chrome Extension
browser

whenYou highlight caption lines on a page and hit 'Convert to video'

thenGet a storyboard back in a new tab, built from the selected cues and ready to review

Install Chrome extension
LinkedIn
publish

whenA captions-to-video finishes rendering

thenPublish the 1:1 cut as a LinkedIn video post with the strongest caption line as the post copy

Publish to LinkedIn
YouTube
publish

whenThe 16:9 cut of a captions-to-video is ready

thenUpload it to your channel and attach the source caption file as the YouTube subtitle track

Publish to YouTube
X (Twitter)
publish

whenA 9:16 captions-to-video clip finishes rendering

thenPost it to X with the opening caption line as the tweet copy

Publish to X
REST APIMCP serverWebhooksProgrammatic captions-to-video runs in roughly 20 lines against the REST API.

How it compares

If you've been using something else to go from captions to video.

VEED, Kapwing, and Pictory all start from a finished video and add or restyle captions on top of it. ngram works the other way: the caption text is the input. It reads each cue as a beat, plans a storyboard you can argue with before render, generates the visuals per scene, and burns the captions back in on the way out.

FeaturengramVEEDKapwingPictory
Direction of the workflowCaption text in, full video outExisting video in, captions added on topExisting video in, captions added on topArticle or script in, stock-clip video out
Reads an SRT or VTT body as the sourceYes, parses cues and builds scenes from themImports captions onto a clipImports captions onto a clipNo native caption-file input
How the caption text is readCues grouped into a hook-body-CTA scriptTreated as overlay textTreated as overlay textSentence-by-sentence over stock clips
Storyboard review before renderFull scene-by-scene plan, editable in plain languageManual timeline editingManual timeline editingScene cards, limited script edits
Visual generation per sceneAI Visuals matched to each cue's meaningStock and uploaded mediaStock and uploaded mediaStock-library matching
Brand applicationBrand Kit on every scene plus the caption presetTemplate-based brand controlsTemplate-based brand controlsBrand presets, limited per-scene control
Aspect ratios per render16:9, 1:1, 9:16 from one renderOne ratio per exportOne ratio per exportOne ratio per export
Re-render in another languageTranslate the script and re-burn captions, no rebuildManual re-captionManual re-captionManual rework
API and agentic accessREST, MCP server, Zapier, n8n, MakeAPI on paid plansAPI on paid plansLimited API

FAQ

Common questions about captions to video

Paste the body of an SRT or VTT file, or any block of caption lines. ngram strips the cue numbers and timestamps, keeps the spoken text, and assembles it into a hook-body-CTA script. It builds a scene-by-scene storyboard you review and edit in plain language, then exports in 16:9, 1:1, and 9:16 with the captions burned back in.

Still curious?

Captions → Video

Ready to turn a caption file into a branded video?

Paste the caption lines, review the storyboard, export in three ratios with the captions burned in. Roughly five minutes from paste to publish.