Troubleshooting

Common issues and their solutions, organized by category.

Setup & Environment

ModuleNotFoundError: 'cw'

The package isn’t installed in editable mode:

uv pip install -e .

This is needed after a fresh clone or when the src/ layout changes.

DJANGO_SETTINGS_MODULE not set

Ensure you’re running commands through uv run:

uv run manage.py migrate    # correct
python manage.py migrate    # may fail without env setup

Missing static files in admin

Run collectstatic:

uv run manage.py collectstatic --noinput

Migration errors

If you see migration conflicts after pulling changes:

uv run manage.py migrate --run-syncdb

For persistent issues, check that PostgreSQL is running:

docker compose ps

Model Loading

Out of memory (OOM)

CUDA or MPS out-of-memory errors during model loading or generation:

  • Check warm cache — ensure only one model is loaded at a time. The warm cache in tasks.py should evict the previous model automatically.

  • Reduce model size — use load_in_8bit: true in presets.json for large models (e.g., Qwen-Image-2512)

  • Enable CPU offload — set use_sequential_cpu_offload: true in the model’s settings

  • Use a smaller model — switch to a distilled variant (e.g., DreamShaper XL Lightning instead of Juggernaut XL)

Model not found

If a HuggingFace model fails to download:

  • Check internet connectivity

  • Verify the model path in data/presets.json matches the HuggingFace repo ID

  • Run uv run manage.py preload_models to pre-download all models

Wrong dtype

If generation produces garbage or errors about tensor types:

  • Ensure dtype is set correctly in presets.json (most models use bfloat16)

  • Some older models may need float16 instead

LoRA Issues

Architecture mismatch

“LoRA not compatible” errors occur when a LoRA’s base_architecture doesn’t match the selected model:

  • SDXL LoRAs only work with SDXL models (Juggernaut XL, DreamShaper XL, SDXL Turbo)

  • SD15 LoRAs only work with SD 1.5 models (Realistic Vision)

  • Check the LoRA’s base_architecture field in presets.json

Missing trigger words

If a LoRA has no visible effect:

  • Check the LoRA’s prompt field in presets.json — trigger words must be present in the prompt

  • Trigger words are automatically appended when the LoRA is active

CivitAI download failures

If auto-download fails:

  • Verify CIVITAI_API_KEY is set in .env

  • Check the AIR URN format: urn:air:{ecosystem}:lora:civitai:{modelId}@{versionId}

  • CivitAI may rate-limit requests — retry after a few minutes

  • Check logs/tasks.log for the cw.lib.civitai logger

Celery & Task Issues

Tasks stuck in “pending”

  • Verify the Celery worker is running: uv run celery -A cw worker -Q default

  • Check that Valkey/Redis is running: docker compose ps

  • Check Flower at http://localhost:5555 for worker status

Worker crashes

  • Check logs/celery.log and logs/worker_default.log for error messages

  • OOM kills are the most common cause — see the Model Loading section above

  • The solo pool runs one task at a time; a crash during generation kills the worker process

Task timeouts

Large models or complex prompts may take several minutes. Check:

  • The task is actually running (not stuck) via Flower

  • GPU utilization — verify the GPU is active during generation

  • Network issues — HuggingFace model downloads can be slow on first run

Adaptation Pipeline

Evaluation gate loops

If the pipeline repeatedly fails evaluations and exhausts retries:

  • Check evaluation_history on the VideoAdUnit for specific failure reasons

  • Review the evaluation prompt templates — they may be too strict for the target market

  • Try adjusting the evaluation templates via Core > Prompt Templates

  • Each gate allows up to 3 retries before the pipeline stops

Schema validation errors

The pipeline uses structured generation (Outlines library) to constrain LLM output to Pydantic schemas. If validation fails:

  • Check that the LLM model supports the required output format

  • Larger models (7B) are more reliable at structured generation than smaller ones (3B)

  • Check logs/tasks.log for the cw.lib.adaptation logger

Pipeline model loading errors

  • Verify LLM models are configured and active under Core > LLM Models

  • Check that PipelineSettings has a global default model set

  • Ensure the PipelineModelLoader can find the model on HuggingFace

Docker & Infrastructure

Port conflicts

Default ports used by the application:

Port

Service

Fix

5435

PostgreSQL

Check for existing PostgreSQL instances

6379

Valkey/Redis

Check for existing Redis instances

8000

Django

Kill existing Django dev servers

3000

Grafana

Check for existing Grafana instances

3100

Loki

Usually no conflicts

5555

Flower

Kill existing Flower instances

Container startup order

Use ./start.sh which waits for containers to be healthy before starting Django and workers. If using honcho start directly, containers may not be ready when Django starts — you’ll see database connection errors that resolve after a few seconds.

Diagnostic Commands

Quick checks for common issues:

# Check all containers are running
docker compose ps

# Check worker status
curl -s http://localhost:5555/api/workers | python -m json.tool

# Check recent errors in task logs
cat logs/tasks.log | jq 'select(.levelname == "ERROR")' | tail -20

# Check Django is responding
curl -s http://localhost:8000/app/ | head -5

# Verify database connectivity
uv run manage.py check --database default