VOD Deep Dive Part 11: End-to-End Workflow — From Upload to Playback

The complete 10-step VOD production pipeline: upload, content moderation, probe, transcode, package, publish, CDN pre-warm, orchestration with Step Functions and Temporal, disaster recovery.

zhuermu · · 20 min
vodstreamingworkflowstep-functionstranscodingpipeline

This is Part 11 of the VOD Streaming Deep Dive series.


The Full Pipeline at a Glance

A video goes through at least 10 steps from file handoff to global playback:

①Upload  →  ②Content Scan  →  ③Probe  →  ④Transcode  →  ⑤Package+Encrypt
   ↓            ↓                ↓          ↓               ↓
   │        NSFW / copyright   ffprobe   Multi-rate       CMAF +
   │        fingerprint /      resolution H.264+H.265     DRM
   │        virus scan                   +AV1             Signing

⑥Publish to Storage → ⑦Write Metadata → ⑧CDN Pre-warm → ⑨Notify → ⑩Monitor
   ↓                    ↓                 ↓               ↓         ↓
   S3 / Object        MySQL / ES        Major region    Push /     QoE
   Storage             Index             edge nodes     Homepage   Regression

Every step can fail. A mature platform has retries, alerts, and manual fallbacks at each stage.


Step 1: Upload — Getting Large Files to the Cloud

Challenges:

  • Source files are often 5–50 GB (ProRes / DNxHD)
  • Unreliable networks (cross-border, office WiFi)
  • Users may close their laptop or lose connection mid-upload

Multipart Upload

Split the file into 5–100 MB chunks, upload concurrently:

Large file ────► [Chunk 1] [Chunk 2] [Chunk 3] ... [Chunk N]
                     │         │         │             │
                     ▼         ▼         ▼             ▼
                S3 Multipart concurrent upload (5-10 parallel)


                    Merge after all complete

Benefits: Parallel speedup, resume on failure (only retry failed chunks), natively supported by S3/GCS/Azure Blob.

Dedup (MD5/CRC Pre-Check)

Before uploading, the client computes an MD5 hash:

Client → Backend: "Uploading file, md5=abc123"
Backend → Client: "Already have it, skip upload"   ← instant
              or
               "Not found, please upload"

S3 Multipart API (Three Steps)

import boto3

s3 = boto3.client('s3')

# 1. Initiate multipart upload
resp = s3.create_multipart_upload(Bucket='my-bucket', Key='video.mov')
upload_id = resp['UploadId']

# 2. Upload each part
parts = []
for i, chunk in enumerate(chunks, start=1):
    resp = s3.upload_part(
        Bucket='my-bucket', Key='video.mov',
        UploadId=upload_id, PartNumber=i,
        Body=chunk)
    parts.append({'PartNumber': i, 'ETag': resp['ETag']})

# 3. Complete and merge
s3.complete_multipart_upload(
    Bucket='my-bucket', Key='video.mov',
    UploadId=upload_id,
    MultipartUpload={'Parts': parts})

Step 2: Content Scanning — Compliance

Upload goes to a quarantine bucket (not production storage). Content must pass review before promotion.

Required Checks

CheckTools
Virus scanClamAV, VirusTotal API
NSFW detectionAWS Rekognition Content Moderation
Violence / gore / political sensitivitySame as above
Copyright fingerprintingAudible Magic (music), ACRCloud
Face recognition (if needed)AWS Rekognition
Subtitle complianceKeyword filtering, regional adaptation

Human Review

AI is a first pass only. High-value or borderline content enters a human review queue:

  • Reviewer watches (sampled or full)
  • Marks compliant or non-compliant
  • Records reasoning

Step 3: Probe — Validate Before Transcoding

ffprobe scans the file to extract metadata:

ffprobe -v error -show_format -show_streams -of json input.mov > probe.json

Extracts: resolution, frame rate, codec, audio tracks, sample rate, duration, variable frame rate flags, HDR metadata.

Validation rules:

  • Duration < 30 seconds → reject (not a valid episode)
  • Resolution below 720p → reject (insufficient quality)
  • Frame rate not in 60 → warn
  • HDR but color space not BT.2020 → flag for correction or rejection

Step 4: Transcode — The Core Production Step

Output Inventory

A single source file typically produces:

/vod/ep-001/
  mezz/original.mov              ← Source backup (cold storage)
  v_360p.mp4                     ← H.264 360p
  v_480p.mp4                     ← H.264 480p
  v_720p.mp4                     ← H.264 720p
  v_1080p.mp4                    ← H.264 1080p
  v_720p_hevc.mp4                ← H.265 720p
  v_1080p_hevc.mp4               ← H.265 1080p
  v_720p_av1.mp4                 ← AV1 720p (flagship devices)
  audio_en.mp4                   ← English AAC
  audio_zh.mp4                   ← Chinese AAC
  subs_en.vtt                    ← English subtitles
  subs_zh.vtt                    ← Chinese subtitles
  thumbnails.vtt + sprite.jpg    ← Thumbnail preview (scrub bar)

Parallel Acceleration

A single episode may need 10+ output tiers. Two parallelism strategies:

By tier (simple):

Transcode cluster:
  Worker 1: Source → 360p H.264
  Worker 2: Source → 720p H.264
  Worker 3: Source → 1080p H.264
  Worker 4: Source → 720p H.265
  ...all workers run simultaneously

By GOP split (complex but faster):

Split source at IDR boundaries into N segments
Each segment sent to a different worker for independent transcoding
Concatenate bitstreams at the end

Netflix and YouTube spin up hundreds of cloud instances to transcode a single movie in parallel, completing in about an hour.

Transcoding Service Options

ServiceCharacteristics
Self-hosted ffmpeg clusterMaximum control, lowest cost at scale, highest ops burden
AWS MediaConvertPay-per-minute, QVBR, HDR/DRM integration
Bitmovin / MuxDeveloper-friendly, enterprise-grade

Cost Optimization

  • Spot / Preemptible instances + checkpoint-resume: Save 70–80%
  • Long-tail content: H.264 only (save transcoding + storage cost)
  • Popular content: Add H.265/AV1 tiers

Step 5: Packaging and Encryption

Transcoded MP4s need to become streaming formats. See Part 5 (Protocols) and Part 8 (DRM):

Transcoded MP4s


Shaka Packager / MediaPackage / mp4box


Output:
  CMAF fMP4 segments
  HLS master.m3u8 + media.m3u8
  DASH manifest.mpd
  (Optional) CENC/CBCS encrypted segments + license metadata

Steps 6–7: Publish to Storage + Write Metadata

Transcoded/packaged assets upload to production storage (typically a separate S3 “production bucket”):

/vod-production/ep-001/
  init.mp4
  seg_*.m4s
  master.m3u8
  manifest.mpd
  thumbnails/...

Simultaneously write to the business database:

INSERT INTO episodes (
    episode_id, show_id, title, duration_sec,
    manifest_url, thumbnail_url,
    status, publish_at, ...
) VALUES (...);

Update the search index (Elasticsearch / Algolia) for discoverability.


Step 8: CDN Pre-Warming

See Part 7: CDN Distribution.

Don’t wait for the first wave of users to trigger origin fetches. Pre-warm new and anticipated-popular content by pushing init segments and the first 5–10 media segments to edge nodes globally.


Step 9: Notify and Launch

  • CMS status transitions from “processing” to “ready”
  • Push notification to subscribers: “Your followed show has a new episode”
  • Update homepage recommendations / rankings
  • Activate ad creatives (if monetized content)

Step 10: Monitor and Regression

Post-launch continuous monitoring:

  • Playback success rate
  • QoE metrics within normal ranges (see Part 10)
  • Anomalous titles flagged (e.g., one episode with unusually high rebuffering)
  • Issues found → re-transcode or rollback

Orchestration: Wiring the Steps Together

The 10 steps above must execute reliably in sequence with retries, parallelism, and failure branching. You can’t hardcode this in a single script.

Common Orchestration Tools

ToolCharacteristicsBest for
AWS Step FunctionsServerless, visual, tight AWS integrationAWS-native, default for AWS VOD solutions
TemporalDistributed workflows, strong consistency, type-safeSelf-managed control plane / multi-cloud
AirflowBatch ETL, scheduled jobsData processing, offline analytics
Argo WorkflowsKubernetes-nativeK8s teams
Custom (Kafka-driven)Event-driven, each step publishes next eventMaximum customization

Step Functions Example (Simplified)

{
  "StartAt": "Probe",
  "States": {
    "Probe": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:ProbeVideo",
      "Next": "ParallelTranscode"
    },
    "ParallelTranscode": {
      "Type": "Parallel",
      "Branches": [
        {"StartAt": "Transcode360p", "States": {
          "Transcode360p": {"Type": "Task",
            "Resource": "arn:aws:mediaconvert:...", "End": true}}},
        {"StartAt": "Transcode720p", "States": {
          "Transcode720p": {"Type": "Task",
            "Resource": "arn:aws:mediaconvert:...", "End": true}}},
        {"StartAt": "Transcode1080p", "States": {
          "Transcode1080p": {"Type": "Task",
            "Resource": "arn:aws:mediaconvert:...", "End": true}}}
      ],
      "Next": "Package"
    },
    "Package": {
      "Type": "Task", "Resource": "...Package...", "Next": "Prewarm"
    },
    "Prewarm": {
      "Type": "Task", "Resource": "...Prewarm...", "Next": "Notify"
    },
    "Notify": {
      "Type": "Task", "Resource": "...Notify...", "End": true
    }
  }
}

Step Functions provides built-in retries, timeouts, branching, and a visual execution graph.


Disaster Recovery and Rollback

Multi-Region Backup

  • Source files (mezzanines) replicated cross-region via S3 Cross-Region Replication
  • Popular content stored in multiple regions (local delivery + failover)

Rollback Scenarios

  • New title launched with content issues → one-click takedown + CDN cache invalidation
  • Transcoded version has a bug → switch manifest pointer back to the previous version

Data Backup

  • Metadata database: daily backups
  • User watch progress (high volume) → tiered storage: hot data in Redis, cold data archived to S3 Glacier

A Typical Timeline

A 60-minute movie from upload to global availability:

T+0min    Creator uploads mezzanine to quarantine bucket
T+5min    Upload complete → triggers Step Functions
T+7min    NSFW + virus scan complete
T+8min    ffprobe validation passes
T+10min   Parallel transcode starts (H.264 x4 + H.265 x2 + AV1 x1)
T+70min   All transcodes complete (~1x real-time, parallel tiers)
T+72min   Packager generates HLS + DASH + DRM
T+73min   Assets uploaded to production S3
T+74min   CDN pre-warming complete (NA / EU / APAC)
T+75min   Metadata written, search index updated, notifications sent
T+75min   Live ✅

Short-form video apps (60–90 second episodes) complete the entire pipeline in minutes.


Key Takeaways

  1. The VOD pipeline has 10 major steps — each can fail and needs retries + alerting.
  2. Upload: Multipart + MD5 dedup. Storage: Quarantine bucket + scanning.
  3. ffprobe validation is mandatory before transcoding.
  4. Transcoding is the longest step — parallelize by tier or GOP + use Spot instances.
  5. Packaging + DRM is the final step before publishing.
  6. CDN pre-warming is critical for new content launches.
  7. Step Functions / Temporal are the go-to orchestration tools.
  8. Plan for disaster recovery and rollback from day one.

Previous: Part 10: QoE Metrics

Next: Part 12: Building VOD on AWS