Troubleshooting
Common issues and their solutions, organized by category.
Setup & Environment
ModuleNotFoundError: 'cw'
The package isn’t installed in editable mode:
uv pip install -e .
This is needed after a fresh clone or when the src/ layout changes.
DJANGO_SETTINGS_MODULE not set
Ensure you’re running commands through uv run:
uv run manage.py migrate # correct
python manage.py migrate # may fail without env setup
Missing static files in admin
Run collectstatic:
uv run manage.py collectstatic --noinput
Migration errors
If you see migration conflicts after pulling changes:
uv run manage.py migrate --run-syncdb
For persistent issues, check that PostgreSQL is running:
docker compose ps
Model Loading
Out of memory (OOM)
CUDA or MPS out-of-memory errors during model loading or generation:
Check warm cache — ensure only one model is loaded at a time. The warm cache in
tasks.pyshould evict the previous model automatically.Reduce model size — use
load_in_8bit: truein presets.json for large models (e.g., Qwen-Image-2512)Enable CPU offload — set
use_sequential_cpu_offload: truein the model’s settingsUse a smaller model — switch to a distilled variant (e.g., DreamShaper XL Lightning instead of Juggernaut XL)
Model not found
If a HuggingFace model fails to download:
Check internet connectivity
Verify the model path in
data/presets.jsonmatches the HuggingFace repo IDRun
uv run manage.py preload_modelsto pre-download all models
Wrong dtype
If generation produces garbage or errors about tensor types:
Ensure
dtypeis set correctly in presets.json (most models usebfloat16)Some older models may need
float16instead
LoRA Issues
Architecture mismatch
“LoRA not compatible” errors occur when a LoRA’s base_architecture doesn’t match the selected model:
SDXL LoRAs only work with SDXL models (Juggernaut XL, DreamShaper XL, SDXL Turbo)
SD15 LoRAs only work with SD 1.5 models (Realistic Vision)
Check the LoRA’s
base_architecturefield in presets.json
Missing trigger words
If a LoRA has no visible effect:
Check the LoRA’s
promptfield in presets.json — trigger words must be present in the promptTrigger words are automatically appended when the LoRA is active
CivitAI download failures
If auto-download fails:
Verify
CIVITAI_API_KEYis set in.envCheck the AIR URN format:
urn:air:{ecosystem}:lora:civitai:{modelId}@{versionId}CivitAI may rate-limit requests — retry after a few minutes
Check
logs/tasks.logfor thecw.lib.civitailogger
Celery & Task Issues
Tasks stuck in “pending”
Verify the Celery worker is running:
uv run celery -A cw worker -Q defaultCheck that Valkey/Redis is running:
docker compose psCheck Flower at http://localhost:5555 for worker status
Worker crashes
Check
logs/celery.logandlogs/worker_default.logfor error messagesOOM kills are the most common cause — see the Model Loading section above
The
solopool runs one task at a time; a crash during generation kills the worker process
Task timeouts
Large models or complex prompts may take several minutes. Check:
The task is actually running (not stuck) via Flower
GPU utilization — verify the GPU is active during generation
Network issues — HuggingFace model downloads can be slow on first run
Adaptation Pipeline
Evaluation gate loops
If the pipeline repeatedly fails evaluations and exhausts retries:
Check
evaluation_historyon the VideoAdUnit for specific failure reasonsReview the evaluation prompt templates — they may be too strict for the target market
Try adjusting the evaluation templates via Core > Prompt Templates
Each gate allows up to 3 retries before the pipeline stops
Schema validation errors
The pipeline uses structured generation (Outlines library) to constrain LLM output to Pydantic schemas. If validation fails:
Check that the LLM model supports the required output format
Larger models (7B) are more reliable at structured generation than smaller ones (3B)
Check
logs/tasks.logfor thecw.lib.adaptationlogger
Pipeline model loading errors
Verify LLM models are configured and active under Core > LLM Models
Check that PipelineSettings has a global default model set
Ensure the
PipelineModelLoadercan find the model on HuggingFace
Docker & Infrastructure
Port conflicts
Default ports used by the application:
Port |
Service |
Fix |
|---|---|---|
5435 |
PostgreSQL |
Check for existing PostgreSQL instances |
6379 |
Valkey/Redis |
Check for existing Redis instances |
8000 |
Django |
Kill existing Django dev servers |
3000 |
Grafana |
Check for existing Grafana instances |
3100 |
Loki |
Usually no conflicts |
5555 |
Flower |
Kill existing Flower instances |
Container startup order
Use ./start.sh which waits for containers to be healthy before starting Django and workers. If using honcho start directly, containers may not be ready when Django starts — you’ll see database connection errors that resolve after a few seconds.
Diagnostic Commands
Quick checks for common issues:
# Check all containers are running
docker compose ps
# Check worker status
curl -s http://localhost:5555/api/workers | python -m json.tool
# Check recent errors in task logs
cat logs/tasks.log | jq 'select(.levelname == "ERROR")' | tail -20
# Check Django is responding
curl -s http://localhost:8000/app/ | head -5
# Verify database connectivity
uv run manage.py check --database default