Video Upload & Analysis

This guide covers uploading video files, the security validation pipeline, and the 10-phase video analysis process.

Overview

The video analysis pipeline extracts structured data from uploaded video files — scenes, transcription, objects, visual style, sentiment, and audience insights — and produces a VideoProcessingResult that can be used to create origin ad units.

        graph TD
    Upload[Upload Video] --> Validate[Security Validation]
    Validate --> Metadata[1. Extract Metadata]
    Metadata --> Scenes[2. Detect Scenes]
    Scenes --> Transcribe[3. Transcribe Audio]
    Transcribe --> Keyframes[4. Extract Keyframes]
    Keyframes --> Objects[5. Detect Objects]
    Objects --> Style[6. Analyze Visual Style]
    Style --> Sentiment[7. Analyze Sentiment]
    Sentiment --> Categories[8. Categorize Scenes]
    Categories --> Script[9. Generate Script]
    Script --> Insights[10. Audience Insights]
    Insights --> Result[VideoProcessingResult]

Uploading a Video

Videos are uploaded via the Ad Unit Media model, linked to a Campaign:

Navigate to TV Spots > Ad Unit Media > Add Ad Unit Media
Select the Campaign
Upload an MP4 video file
Save — the analyze_video_task is automatically queued

The upload process runs security validation before saving (see below).

Security Validation

The VideoFileValidator runs five validators in sequence on every upload:

Validator	What It Checks
FileSizeValidator	File size between 1 byte and 500 MB (configurable via `VIDEO_MAX_UPLOAD_SIZE_BYTES`)
FileExtensionValidator	Extension in whitelist: `.mp4`, `.mov`, `.avi`, `.mkv`, `.webm`
MimeTypeValidator	Detected MIME type (via `libmagic`) matches allowed video types. Warns on mismatch between declared and detected MIME.
FileHeaderValidator	File magic bytes match known video signatures (MP4 ftyp box, AVI RIFF, MKV/WebM EBML header)
FilenameSanitizer	Removes path traversal patterns (`..`), control characters, and Windows-forbidden characters. Replaces spaces with underscores.

All validation failures are logged to the cw.lib.security logger for security auditing. If any validator fails, the upload is rejected with a combined error message.

Analysis Phases

The analyze_video_task orchestrates 10 analysis phases with progress tracking:

Phase 1: Extract Metadata

Extracts video metadata using PyAV:

Duration, resolution (width/height), frame rate
Audio channels and sample rate
File size

Progress: 0–10%

Phase 2: Scene Detection

Detects scene boundaries using PySceneDetect:

Identifies scene transitions (cuts, fades, dissolves)
Returns scene list with start/end times and duration

Progress: 10–30%

Phase 3: Audio Transcription

Transcribes audio using Whisper (Large v3):

Full transcription with timestamps per segment
Language detection and confidence score
Speaker identification

Progress: 30–50%

Phase 4: Keyframe Extraction & Object Detection

Extracts one keyframe per scene (middle frame) and runs YOLO v8x object detection:

Keyframes saved as JPEG images
Object detection with bounding boxes and confidence scores
Objects summarized across all frames (counts, most common classes)

Progress: 50–70%

Phase 5: Visual Style Analysis

Analyzes visual style from keyframes using OpenCV with k-means clustering:

Dominant color palette (hex codes)
Average brightness and contrast
Lighting distribution (soft, harsh, dramatic)
Exposure distribution (normal, overexposed, underexposed)

Phase 6: Camera Work Analysis

Derived from scene data:

Average scene duration and pacing (fast/medium/slow)
Total scene count and number of transitions

Phase 7: Sentiment Analysis

Combines text and visual signals:

Text sentiment — keyword-based analysis of transcription
Visual sentiment — derived from brightness, color warmth, and detected objects
Overall sentiment score and confidence

Phase 8: Scene Categorization

Categorizes each scene into one of 10 categories based on detected objects, transcription content, and visual features:

People, product, lifestyle, action, nature, urban, food, technology, abstract, other
Primary categories identified from frequency across scenes

Phase 9: Script Generation

Generates a structured script by mapping transcription segments to detected scenes:

Each scene gets a visual description and associated voiceover text
Output format matches the TV spot JSON schema (scene number, duration, visual, audio)

Phase 10: Audience Insights

Generates audience targeting insights using an LLM (Qwen 2.5):

Primary audience demographics and psychographics
Secondary audience segments
Market potential analysis (high-fit markets, considerations)
Schema-validated output via structured generation

Progress: 90–100%

VideoProcessingResult

All analysis output is stored in a single VideoProcessingResult record with JSON fields:

Field	Contents
`scenes`	Detected scenes with timestamps, visual descriptions, objects, colors, sentiment
`script`	Generated script in TV spot JSON format
`transcription`	Full transcription with timestamps, language, confidence, segments
`visual_style`	Color palette, brightness, contrast, lighting, camera work
`objects_summary`	Aggregated object detection: total count, per-class counts and confidence
`sentiment_analysis`	Overall sentiment, text sentiment, visual sentiment, scores
`categories`	Scene categorization summary with primary categories
`audience_insights`	LLM-generated demographics, psychographics, market potential
`processing_time`	Total processing time in seconds
`models_used`	Which model/library was used for each analysis phase

KeyFrame Records

Each extracted keyframe is saved as a KeyFrame record linked to the result:

Scene number and timestamp
Saved image file
Per-frame object detection results (label, confidence, bounding box)
Dominant colors for the specific frame
Embedding field (reserved for future similarity search)

AdUnitMedia Lifecycle

Status	Description
`pending`	Upload not yet started
`uploaded`	File uploaded, awaiting processing
`processing`	Analysis task running
`completed`	Analysis complete, result attached
`failed`	Analysis failed (check `processing_error`)
`reviewed`	Results reviewed and approved

The task includes retry logic with exponential backoff (up to 3 retries, starting at 60 seconds).

Viewing Results

After processing completes:

View the Ad Unit Media record — metadata fields (duration, resolution, etc.) are populated
Click through to the linked Video Processing Result for full analysis data
View Key Frames as inline images with object detection overlay data
Use the result to inform the creation of an origin ad unit with accurate script rows