Research Notes

Background research and reference materials that informed design decisions.

Multilingual LLM Model Selection

The adaptation pipeline requires LLM models that can generate culturally-appropriate text in the target language. Model selection is driven by language coverage, structured generation support, and memory footprint.

Evaluated Models

Model	Parameters	Notes
Qwen 2.5 7B Instruct	7B	Strong multilingual (29+ languages), best for CJK. Default primary model.
Qwen 2.5 3B Instruct	3B	Lighter variant for prompt enhancement. Used by HFPromptEnhancer.
CohereForAI Aya Expanse 8B	8B	23 languages, strong for African and South Asian languages.
Mistral Nemo Instruct 12B	12B	European language specialist, good for Romance/Germanic.
Google Gemma-2 9B	9B	Broad coverage, good at instruction following.
Meta Llama 3.1 8B	8B	Strong English, adequate multilingual. Alternative model.

Selection Criteria

Language coverage — must handle the target language well, especially for non-Latin scripts
Structured generation — must work with Outlines library for Pydantic schema constraints
4-bit quantization — must run in ~6 GB VRAM when quantized
Chat template — must have a proper chat template for system/user message formatting

The primary_model field on each Language record maps to the best LLM for that language. Alternative models are listed for fallback.

Adaptation Pipeline Design

The 7-node pipeline was designed to separate concerns and enable independent evaluation of each quality dimension.

Why Separate Evaluation Gates

Early prototypes used a single LLM call for adaptation + evaluation. This failed because:

A single prompt could not reliably check format compliance, cultural sensitivity, concept fidelity, and brand consistency simultaneously
Failures were hard to diagnose — which dimension caused the rejection?
Retry logic could not target specific issues

The current design uses four specialized evaluation nodes, each with its own prompt template and Pydantic schema. When an evaluation fails, the specific failure reason is recorded and the writer node receives targeted feedback for revision.

Retry Budget

Each evaluation gate allows up to 3 retries. This budget was chosen empirically:

1 retry catches most formatting issues (wrong language in descriptions, missing translations)
2 retries handle most cultural adjustments
3 retries is a reasonable maximum before the pipeline should stop and surface the issue for human review

Structured Generation

The pipeline uses the Outlines library to constrain LLM output to Pydantic schemas. This eliminates JSON parsing errors and ensures each node produces well-formed output.

Trade-off: structured generation is slower than free-form generation and requires models that handle constrained decoding well. Larger models (7B+) are significantly more reliable than 3B models for this purpose.

Video Upload Security

Video file uploads are validated through 5 layers of security before processing.

Threat Model

Video files can carry:

Oversized payloads — denial of service via memory/disk exhaustion
Extension spoofing — .mp4 extension with non-video content
MIME type mismatches — content that doesn’t match declared type
Malformed headers — files with invalid or missing magic bytes
Path traversal — filenames with ../ or special characters

Validation Chain

Validator	What It Checks
`FileSizeValidator`	File does not exceed `VIDEO_MAX_UPLOAD_SIZE_BYTES` (default 500 MB)
`FileExtensionValidator`	Extension is in allowlist (`.mp4`, `.mov`, `.avi`, `.mkv`, `.webm`)
`MimeTypeValidator`	Content MIME type (via `libmagic`) matches allowlist
`FileHeaderValidator`	First bytes match known video container magic numbers
`FilenameSanitizer`	Strips path components, normalizes Unicode, removes special characters

All validators run before the file is passed to the video analysis pipeline. Validation is configured via Django settings and can be adjusted per deployment.

Design Decision: the composite VideoFileValidator runs all 5 checks and reports all failures at once (rather than fail-fast), so operators see the complete list of issues.

Model Memory Management

GPU VRAM is the primary constraint. The system manages memory through:

Sequential Execution

The Celery solo pool ensures only one task runs at a time. Combined with module-level model caching, this means:

Only one diffusion model is loaded at a time
Switching models evicts the previous one and frees VRAM
Prompt enhancement and pipeline LLMs are evicted before image generation

Eviction Strategy

Before image generation:

_evict_enhancer() — clears cached HFPromptEnhancer (Qwen 3B)
_evict_pipeline_model() — clears PipelineModelLoader singleton (Qwen 7B)
gc.collect() + torch.cuda.empty_cache() / torch.mps.empty_cache()

This ensures the full VRAM budget is available for the diffusion model.

Quantization

LLM models support load_in_4bit for ~4x memory reduction (via bitsandbytes)
Diffusion models support load_in_8bit (Qwen-Image only)
All diffusion models use bfloat16 or float16 precision
VAE is kept in float32 on Apple Silicon (MPS) to avoid precision issues