What are the steps in a VOD video processing pipeline?

A video goes through at least 10 steps from upload to global playback: upload (multipart with MD5 dedup), content scanning (virus, NSFW, copyright), ffprobe validation, transcoding to multiple bitrate tiers, packaging and DRM encryption, publishing to production storage, writing metadata to the database and search index, CDN pre-warming, launch notifications, and continuous QoE monitoring. Each step can fail, so mature platforms add retries, alerts, and manual fallbacks at every stage.

How do you upload very large video files to S3 reliably?

Use multipart upload: split the file into 5–100 MB chunks and upload 5–10 in parallel. This speeds up transfer and allows resuming on failure — only failed chunks are retried, which matters for 5–50 GB mezzanine files on unreliable networks. Combine it with an MD5 pre-check so the backend can skip files it already has. S3, GCS, and Azure Blob all support multipart natively; on S3 it's a three-step API: create, upload parts, complete.

How long does it take to transcode and publish a movie for streaming?

Roughly 75 minutes for a 60-minute movie in a typical pipeline: content scanning and ffprobe validation finish in the first few minutes, parallel transcoding (multiple H.264, H.265, and AV1 tiers running simultaneously) takes about real-time (~1 hour), then packaging, DRM, upload to production storage, and CDN pre-warming add a few more minutes. Netflix and YouTube spin up hundreds of instances for GOP-level parallelism; short-form video apps finish the whole pipeline in minutes.

阅读中文版 →

VOD Deep Dive Part 11: End-to-End Workflow — From Upload to Playback

The complete 10-step VOD production pipeline: upload, content moderation, probe, transcode, package, publish, CDN pre-warm, orchestration with Step Functions and Temporal, disaster recovery.

zhuermu · May 10, 2026 · 20 min

vodstreamingworkflowstep-functionstranscodingpipeline

This is Part 11 of the VOD Streaming Deep Dive series.

The Full Pipeline at a Glance

A video goes through at least 10 steps from file handoff to global playback:

①Upload  →  ②Content Scan  →  ③Probe  →  ④Transcode  →  ⑤Package+Encrypt
   ↓            ↓                ↓          ↓               ↓
   │        NSFW / copyright   ffprobe   Multi-rate       CMAF +
   │        fingerprint /      resolution H.264+H.265     DRM
   │        virus scan                   +AV1             Signing
   ↓
⑥Publish to Storage → ⑦Write Metadata → ⑧CDN Pre-warm → ⑨Notify → ⑩Monitor
   ↓                    ↓                 ↓               ↓         ↓
   S3 / Object        MySQL / ES        Major region    Push /     QoE
   Storage             Index             edge nodes     Homepage   Regression

Every step can fail. A mature platform has retries, alerts, and manual fallbacks at each stage.

Step 1: Upload — Getting Large Files to the Cloud

Challenges:

Source files are often 5–50 GB (ProRes / DNxHD)
Unreliable networks (cross-border, office WiFi)
Users may close their laptop or lose connection mid-upload

Multipart Upload

Split the file into 5–100 MB chunks, upload concurrently:

Large file ────► [Chunk 1] [Chunk 2] [Chunk 3] ... [Chunk N]
                     │         │         │             │
                     ▼         ▼         ▼             ▼
                S3 Multipart concurrent upload (5-10 parallel)
                              │
                              ▼
                    Merge after all complete

Benefits: Parallel speedup, resume on failure (only retry failed chunks), natively supported by S3/GCS/Azure Blob.

Dedup (MD5/CRC Pre-Check)

Before uploading, the client computes an MD5 hash:

Client → Backend: "Uploading file, md5=abc123"
Backend → Client: "Already have it, skip upload"   ← instant
              or
               "Not found, please upload"

S3 Multipart API (Three Steps)

import boto3

s3 = boto3.client('s3')

# 1. Initiate multipart upload
resp = s3.create_multipart_upload(Bucket='my-bucket', Key='video.mov')
upload_id = resp['UploadId']

# 2. Upload each part
parts = []
for i, chunk in enumerate(chunks, start=1):
    resp = s3.upload_part(
        Bucket='my-bucket', Key='video.mov',
        UploadId=upload_id, PartNumber=i,
        Body=chunk)
    parts.append({'PartNumber': i, 'ETag': resp['ETag']})

# 3. Complete and merge
s3.complete_multipart_upload(
    Bucket='my-bucket', Key='video.mov',
    UploadId=upload_id,
    MultipartUpload={'Parts': parts})

Step 2: Content Scanning — Compliance

Upload goes to a quarantine bucket (not production storage). Content must pass review before promotion.

Required Checks

Check	Tools
Virus scan	ClamAV, VirusTotal API
NSFW detection	AWS Rekognition Content Moderation
Violence / gore / political sensitivity	Same as above
Copyright fingerprinting	Audible Magic (music), ACRCloud
Face recognition (if needed)	AWS Rekognition
Subtitle compliance	Keyword filtering, regional adaptation

Human Review

AI is a first pass only. High-value or borderline content enters a human review queue:

Reviewer watches (sampled or full)
Marks compliant or non-compliant
Records reasoning

Step 3: Probe — Validate Before Transcoding

ffprobe scans the file to extract metadata:

ffprobe -v error -show_format -show_streams -of json input.mov > probe.json

Extracts: resolution, frame rate, codec, audio tracks, sample rate, duration, variable frame rate flags, HDR metadata.

Validation rules:

Duration < 30 seconds → reject (not a valid episode)
Resolution below 720p → reject (insufficient quality)
Frame rate not in 60 → warn
HDR but color space not BT.2020 → flag for correction or rejection

Step 4: Transcode — The Core Production Step

Output Inventory

A single source file typically produces:

/vod/ep-001/
  mezz/original.mov              ← Source backup (cold storage)
  v_360p.mp4                     ← H.264 360p
  v_480p.mp4                     ← H.264 480p
  v_720p.mp4                     ← H.264 720p
  v_1080p.mp4                    ← H.264 1080p
  v_720p_hevc.mp4                ← H.265 720p
  v_1080p_hevc.mp4               ← H.265 1080p
  v_720p_av1.mp4                 ← AV1 720p (flagship devices)
  audio_en.mp4                   ← English AAC
  audio_zh.mp4                   ← Chinese AAC
  subs_en.vtt                    ← English subtitles
  subs_zh.vtt                    ← Chinese subtitles
  thumbnails.vtt + sprite.jpg    ← Thumbnail preview (scrub bar)

Parallel Acceleration

A single episode may need 10+ output tiers. Two parallelism strategies:

By tier (simple):

Transcode cluster:
  Worker 1: Source → 360p H.264
  Worker 2: Source → 720p H.264
  Worker 3: Source → 1080p H.264
  Worker 4: Source → 720p H.265
  ...all workers run simultaneously

By GOP split (complex but faster):

Split source at IDR boundaries into N segments
Each segment sent to a different worker for independent transcoding
Concatenate bitstreams at the end

Netflix and YouTube spin up hundreds of cloud instances to transcode a single movie in parallel, completing in about an hour.

Transcoding Service Options

Service	Characteristics
Self-hosted ffmpeg cluster	Maximum control, lowest cost at scale, highest ops burden
AWS MediaConvert	Pay-per-minute, QVBR, HDR/DRM integration
Bitmovin / Mux	Developer-friendly, enterprise-grade

Cost Optimization

Spot / Preemptible instances + checkpoint-resume: Save 70–80%
Long-tail content: H.264 only (save transcoding + storage cost)
Popular content: Add H.265/AV1 tiers

Step 5: Packaging and Encryption

Transcoded MP4s need to become streaming formats. See Part 5 (Protocols) and Part 8 (DRM):

Transcoded MP4s
       │
       ▼
Shaka Packager / MediaPackage / mp4box
       │
       ▼
Output:
  CMAF fMP4 segments
  HLS master.m3u8 + media.m3u8
  DASH manifest.mpd
  (Optional) CENC/CBCS encrypted segments + license metadata

Steps 6–7: Publish to Storage + Write Metadata

Transcoded/packaged assets upload to production storage (typically a separate S3 “production bucket”):

/vod-production/ep-001/
  init.mp4
  seg_*.m4s
  master.m3u8
  manifest.mpd
  thumbnails/...

Simultaneously write to the business database:

INSERT INTO episodes (
    episode_id, show_id, title, duration_sec,
    manifest_url, thumbnail_url,
    status, publish_at, ...
) VALUES (...);

Update the search index (Elasticsearch / Algolia) for discoverability.

Step 8: CDN Pre-Warming

See Part 7: CDN Distribution.

Don’t wait for the first wave of users to trigger origin fetches. Pre-warm new and anticipated-popular content by pushing init segments and the first 5–10 media segments to edge nodes globally.

Step 9: Notify and Launch

CMS status transitions from “processing” to “ready”
Push notification to subscribers: “Your followed show has a new episode”
Update homepage recommendations / rankings
Activate ad creatives (if monetized content)

Step 10: Monitor and Regression

Post-launch continuous monitoring:

Playback success rate
QoE metrics within normal ranges (see Part 10)
Anomalous titles flagged (e.g., one episode with unusually high rebuffering)
Issues found → re-transcode or rollback

Orchestration: Wiring the Steps Together

The 10 steps above must execute reliably in sequence with retries, parallelism, and failure branching. You can’t hardcode this in a single script.

Common Orchestration Tools

Tool	Characteristics	Best for
AWS Step Functions	Serverless, visual, tight AWS integration	AWS-native, default for AWS VOD solutions
Temporal	Distributed workflows, strong consistency, type-safe	Self-managed control plane / multi-cloud
Airflow	Batch ETL, scheduled jobs	Data processing, offline analytics
Argo Workflows	Kubernetes-native	K8s teams
Custom (Kafka-driven)	Event-driven, each step publishes next event	Maximum customization

Step Functions Example (Simplified)

{
  "StartAt": "Probe",
  "States": {
    "Probe": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:ProbeVideo",
      "Next": "ParallelTranscode"
    },
    "ParallelTranscode": {
      "Type": "Parallel",
      "Branches": [
        {"StartAt": "Transcode360p", "States": {
          "Transcode360p": {"Type": "Task",
            "Resource": "arn:aws:mediaconvert:...", "End": true}}},
        {"StartAt": "Transcode720p", "States": {
          "Transcode720p": {"Type": "Task",
            "Resource": "arn:aws:mediaconvert:...", "End": true}}},
        {"StartAt": "Transcode1080p", "States": {
          "Transcode1080p": {"Type": "Task",
            "Resource": "arn:aws:mediaconvert:...", "End": true}}}
      ],
      "Next": "Package"
    },
    "Package": {
      "Type": "Task", "Resource": "...Package...", "Next": "Prewarm"
    },
    "Prewarm": {
      "Type": "Task", "Resource": "...Prewarm...", "Next": "Notify"
    },
    "Notify": {
      "Type": "Task", "Resource": "...Notify...", "End": true
    }
  }
}

Step Functions provides built-in retries, timeouts, branching, and a visual execution graph.

Disaster Recovery and Rollback

Multi-Region Backup

Source files (mezzanines) replicated cross-region via S3 Cross-Region Replication
Popular content stored in multiple regions (local delivery + failover)

Rollback Scenarios

New title launched with content issues → one-click takedown + CDN cache invalidation
Transcoded version has a bug → switch manifest pointer back to the previous version

Data Backup

Metadata database: daily backups
User watch progress (high volume) → tiered storage: hot data in Redis, cold data archived to S3 Glacier

A Typical Timeline

A 60-minute movie from upload to global availability:

T+0min    Creator uploads mezzanine to quarantine bucket
T+5min    Upload complete → triggers Step Functions
T+7min    NSFW + virus scan complete
T+8min    ffprobe validation passes
T+10min   Parallel transcode starts (H.264 x4 + H.265 x2 + AV1 x1)
T+70min   All transcodes complete (~1x real-time, parallel tiers)
T+72min   Packager generates HLS + DASH + DRM
T+73min   Assets uploaded to production S3
T+74min   CDN pre-warming complete (NA / EU / APAC)
T+75min   Metadata written, search index updated, notifications sent
T+75min   Live ✅

Short-form video apps (60–90 second episodes) complete the entire pipeline in minutes.

Key Takeaways

The VOD pipeline has 10 major steps — each can fail and needs retries + alerting.
Upload: Multipart + MD5 dedup. Storage: Quarantine bucket + scanning.
ffprobe validation is mandatory before transcoding.
Transcoding is the longest step — parallelize by tier or GOP + use Spot instances.
Packaging + DRM is the final step before publishing.
CDN pre-warming is critical for new content launches.
Step Functions / Temporal are the go-to orchestration tools.
Plan for disaster recovery and rollback from day one.

Previous: Part 10: QoE Metrics

Next: Part 12: Building VOD on AWS

References

AWS Step Functions Developer Guide — AWS Documentation
Temporal documentation — Temporal
AWS Elemental MediaConvert User Guide — AWS Documentation