VOD Deep Dive Part 11: End-to-End Workflow — From Upload to Playback
The complete 10-step VOD production pipeline: upload, content moderation, probe, transcode, package, publish, CDN pre-warm, orchestration with Step Functions and Temporal, disaster recovery.
This is Part 11 of the VOD Streaming Deep Dive series.
The Full Pipeline at a Glance
A video goes through at least 10 steps from file handoff to global playback:
①Upload → ②Content Scan → ③Probe → ④Transcode → ⑤Package+Encrypt
↓ ↓ ↓ ↓ ↓
│ NSFW / copyright ffprobe Multi-rate CMAF +
│ fingerprint / resolution H.264+H.265 DRM
│ virus scan +AV1 Signing
↓
⑥Publish to Storage → ⑦Write Metadata → ⑧CDN Pre-warm → ⑨Notify → ⑩Monitor
↓ ↓ ↓ ↓ ↓
S3 / Object MySQL / ES Major region Push / QoE
Storage Index edge nodes Homepage Regression
Every step can fail. A mature platform has retries, alerts, and manual fallbacks at each stage.
Step 1: Upload — Getting Large Files to the Cloud
Challenges:
- Source files are often 5–50 GB (ProRes / DNxHD)
- Unreliable networks (cross-border, office WiFi)
- Users may close their laptop or lose connection mid-upload
Multipart Upload
Split the file into 5–100 MB chunks, upload concurrently:
Large file ────► [Chunk 1] [Chunk 2] [Chunk 3] ... [Chunk N]
│ │ │ │
▼ ▼ ▼ ▼
S3 Multipart concurrent upload (5-10 parallel)
│
▼
Merge after all complete
Benefits: Parallel speedup, resume on failure (only retry failed chunks), natively supported by S3/GCS/Azure Blob.
Dedup (MD5/CRC Pre-Check)
Before uploading, the client computes an MD5 hash:
Client → Backend: "Uploading file, md5=abc123"
Backend → Client: "Already have it, skip upload" ← instant
or
"Not found, please upload"
S3 Multipart API (Three Steps)
import boto3
s3 = boto3.client('s3')
# 1. Initiate multipart upload
resp = s3.create_multipart_upload(Bucket='my-bucket', Key='video.mov')
upload_id = resp['UploadId']
# 2. Upload each part
parts = []
for i, chunk in enumerate(chunks, start=1):
resp = s3.upload_part(
Bucket='my-bucket', Key='video.mov',
UploadId=upload_id, PartNumber=i,
Body=chunk)
parts.append({'PartNumber': i, 'ETag': resp['ETag']})
# 3. Complete and merge
s3.complete_multipart_upload(
Bucket='my-bucket', Key='video.mov',
UploadId=upload_id,
MultipartUpload={'Parts': parts})
Step 2: Content Scanning — Compliance
Upload goes to a quarantine bucket (not production storage). Content must pass review before promotion.
Required Checks
| Check | Tools |
|---|---|
| Virus scan | ClamAV, VirusTotal API |
| NSFW detection | AWS Rekognition Content Moderation |
| Violence / gore / political sensitivity | Same as above |
| Copyright fingerprinting | Audible Magic (music), ACRCloud |
| Face recognition (if needed) | AWS Rekognition |
| Subtitle compliance | Keyword filtering, regional adaptation |
Human Review
AI is a first pass only. High-value or borderline content enters a human review queue:
- Reviewer watches (sampled or full)
- Marks compliant or non-compliant
- Records reasoning
Step 3: Probe — Validate Before Transcoding
ffprobe scans the file to extract metadata:
ffprobe -v error -show_format -show_streams -of json input.mov > probe.json
Extracts: resolution, frame rate, codec, audio tracks, sample rate, duration, variable frame rate flags, HDR metadata.
Validation rules:
- Duration < 30 seconds → reject (not a valid episode)
- Resolution below 720p → reject (insufficient quality)
- Frame rate not in 60 → warn
- HDR but color space not BT.2020 → flag for correction or rejection
Step 4: Transcode — The Core Production Step
Output Inventory
A single source file typically produces:
/vod/ep-001/
mezz/original.mov ← Source backup (cold storage)
v_360p.mp4 ← H.264 360p
v_480p.mp4 ← H.264 480p
v_720p.mp4 ← H.264 720p
v_1080p.mp4 ← H.264 1080p
v_720p_hevc.mp4 ← H.265 720p
v_1080p_hevc.mp4 ← H.265 1080p
v_720p_av1.mp4 ← AV1 720p (flagship devices)
audio_en.mp4 ← English AAC
audio_zh.mp4 ← Chinese AAC
subs_en.vtt ← English subtitles
subs_zh.vtt ← Chinese subtitles
thumbnails.vtt + sprite.jpg ← Thumbnail preview (scrub bar)
Parallel Acceleration
A single episode may need 10+ output tiers. Two parallelism strategies:
By tier (simple):
Transcode cluster:
Worker 1: Source → 360p H.264
Worker 2: Source → 720p H.264
Worker 3: Source → 1080p H.264
Worker 4: Source → 720p H.265
...all workers run simultaneously
By GOP split (complex but faster):
Split source at IDR boundaries into N segments
Each segment sent to a different worker for independent transcoding
Concatenate bitstreams at the end
Netflix and YouTube spin up hundreds of cloud instances to transcode a single movie in parallel, completing in about an hour.
Transcoding Service Options
| Service | Characteristics |
|---|---|
| Self-hosted ffmpeg cluster | Maximum control, lowest cost at scale, highest ops burden |
| AWS MediaConvert | Pay-per-minute, QVBR, HDR/DRM integration |
| Bitmovin / Mux | Developer-friendly, enterprise-grade |
Cost Optimization
- Spot / Preemptible instances + checkpoint-resume: Save 70–80%
- Long-tail content: H.264 only (save transcoding + storage cost)
- Popular content: Add H.265/AV1 tiers
Step 5: Packaging and Encryption
Transcoded MP4s need to become streaming formats. See Part 5 (Protocols) and Part 8 (DRM):
Transcoded MP4s
│
▼
Shaka Packager / MediaPackage / mp4box
│
▼
Output:
CMAF fMP4 segments
HLS master.m3u8 + media.m3u8
DASH manifest.mpd
(Optional) CENC/CBCS encrypted segments + license metadata
Steps 6–7: Publish to Storage + Write Metadata
Transcoded/packaged assets upload to production storage (typically a separate S3 “production bucket”):
/vod-production/ep-001/
init.mp4
seg_*.m4s
master.m3u8
manifest.mpd
thumbnails/...
Simultaneously write to the business database:
INSERT INTO episodes (
episode_id, show_id, title, duration_sec,
manifest_url, thumbnail_url,
status, publish_at, ...
) VALUES (...);
Update the search index (Elasticsearch / Algolia) for discoverability.
Step 8: CDN Pre-Warming
Don’t wait for the first wave of users to trigger origin fetches. Pre-warm new and anticipated-popular content by pushing init segments and the first 5–10 media segments to edge nodes globally.
Step 9: Notify and Launch
- CMS status transitions from “processing” to “ready”
- Push notification to subscribers: “Your followed show has a new episode”
- Update homepage recommendations / rankings
- Activate ad creatives (if monetized content)
Step 10: Monitor and Regression
Post-launch continuous monitoring:
- Playback success rate
- QoE metrics within normal ranges (see Part 10)
- Anomalous titles flagged (e.g., one episode with unusually high rebuffering)
- Issues found → re-transcode or rollback
Orchestration: Wiring the Steps Together
The 10 steps above must execute reliably in sequence with retries, parallelism, and failure branching. You can’t hardcode this in a single script.
Common Orchestration Tools
| Tool | Characteristics | Best for |
|---|---|---|
| AWS Step Functions | Serverless, visual, tight AWS integration | AWS-native, default for AWS VOD solutions |
| Temporal | Distributed workflows, strong consistency, type-safe | Self-managed control plane / multi-cloud |
| Airflow | Batch ETL, scheduled jobs | Data processing, offline analytics |
| Argo Workflows | Kubernetes-native | K8s teams |
| Custom (Kafka-driven) | Event-driven, each step publishes next event | Maximum customization |
Step Functions Example (Simplified)
{
"StartAt": "Probe",
"States": {
"Probe": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:ProbeVideo",
"Next": "ParallelTranscode"
},
"ParallelTranscode": {
"Type": "Parallel",
"Branches": [
{"StartAt": "Transcode360p", "States": {
"Transcode360p": {"Type": "Task",
"Resource": "arn:aws:mediaconvert:...", "End": true}}},
{"StartAt": "Transcode720p", "States": {
"Transcode720p": {"Type": "Task",
"Resource": "arn:aws:mediaconvert:...", "End": true}}},
{"StartAt": "Transcode1080p", "States": {
"Transcode1080p": {"Type": "Task",
"Resource": "arn:aws:mediaconvert:...", "End": true}}}
],
"Next": "Package"
},
"Package": {
"Type": "Task", "Resource": "...Package...", "Next": "Prewarm"
},
"Prewarm": {
"Type": "Task", "Resource": "...Prewarm...", "Next": "Notify"
},
"Notify": {
"Type": "Task", "Resource": "...Notify...", "End": true
}
}
}
Step Functions provides built-in retries, timeouts, branching, and a visual execution graph.
Disaster Recovery and Rollback
Multi-Region Backup
- Source files (mezzanines) replicated cross-region via S3 Cross-Region Replication
- Popular content stored in multiple regions (local delivery + failover)
Rollback Scenarios
- New title launched with content issues → one-click takedown + CDN cache invalidation
- Transcoded version has a bug → switch manifest pointer back to the previous version
Data Backup
- Metadata database: daily backups
- User watch progress (high volume) → tiered storage: hot data in Redis, cold data archived to S3 Glacier
A Typical Timeline
A 60-minute movie from upload to global availability:
T+0min Creator uploads mezzanine to quarantine bucket
T+5min Upload complete → triggers Step Functions
T+7min NSFW + virus scan complete
T+8min ffprobe validation passes
T+10min Parallel transcode starts (H.264 x4 + H.265 x2 + AV1 x1)
T+70min All transcodes complete (~1x real-time, parallel tiers)
T+72min Packager generates HLS + DASH + DRM
T+73min Assets uploaded to production S3
T+74min CDN pre-warming complete (NA / EU / APAC)
T+75min Metadata written, search index updated, notifications sent
T+75min Live ✅
Short-form video apps (60–90 second episodes) complete the entire pipeline in minutes.
Key Takeaways
- The VOD pipeline has 10 major steps — each can fail and needs retries + alerting.
- Upload: Multipart + MD5 dedup. Storage: Quarantine bucket + scanning.
- ffprobe validation is mandatory before transcoding.
- Transcoding is the longest step — parallelize by tier or GOP + use Spot instances.
- Packaging + DRM is the final step before publishing.
- CDN pre-warming is critical for new content launches.
- Step Functions / Temporal are the go-to orchestration tools.
- Plan for disaster recovery and rollback from day one.
Previous: Part 10: QoE Metrics