Prompt Enhancement

This guide covers the three prompt enhancement methods, when to use each, and how enhancement integrates with image generation.

Overview

Prompt enhancement takes a user’s source prompt and improves it for better diffusion model results — adding quality descriptors, style-specific tags, and technical details that diffusion models respond to.

Three methods are available, each with different tradeoffs:

Method	Speed	Quality	Dependencies	Best For
Rule-based	Instant	Good	None	Fast prototyping, consistent results
HuggingFace	~10–30s	Better	Local GPU (3–7 GB VRAM)	Offline use, Apple Silicon
Anthropic API	~3–5s	Best	`ANTHROPIC_API_KEY`	Production quality, nuanced prompts

Rule-Based Enhancement

The PromptEnhancer applies predefined tags and descriptors based on the detected or selected style. No external model is needed.

How it works:

Style detection — if style is set to “auto”, the enhancer scans the prompt for keywords (e.g., “photo” → Photography, “painting” → Artistic)
Quality tags — adds tags like “masterpiece”, “best quality”, “highly detailed”
Style descriptors — adds style-specific terms (e.g., Photography gets “sharp focus, natural lighting”; Cinematic gets “dramatic composition, film grain”)
Negative prompt — generates a negative prompt with common defects to avoid
Creativity scaling — the creativity level (0.0–1.0) controls how many tags are applied

Rule-based enhancement is deterministic — the same input always produces the same output.

HuggingFace Local Model

The HFPromptEnhancer runs a local LLM (default: Qwen2.5-3B-Instruct) to rewrite the prompt with richer detail.

Supported models:

Qwen/Qwen2.5-3B-Instruct (default, 3B parameters)
Qwen/Qwen2.5-7B-Instruct (higher quality, 7B parameters)
gokaygokay/Flux-Prompt-Enhance (specialized for diffusion prompts)
microsoft/Phi-3.5-mini-instruct (efficient, 3.8B parameters)

The model downloads automatically from HuggingFace on first use and is cached locally. Device detection is automatic: MPS (Apple Silicon) → CUDA → CPU.

The enhancer uses two database-backed prompt templates:

prompt-enhancer-system — system prompt defining the enhancer’s role and output format
prompt-enhancer-user — user prompt with the source prompt and style/creativity parameters

These templates are editable in the Django admin under Core > Prompt Templates.

If the LLM fails to produce valid JSON output, the enhancer falls back to rule-based enhancement.

Anthropic API

The LLMPromptEnhancer uses Claude via the Anthropic API for the highest-quality enhancement.

Setup: Set ANTHROPIC_API_KEY in your .env file.

This method uses the same prompt templates as the HuggingFace enhancer (prompt-enhancer-system and prompt-enhancer-user), so customizations apply to both.

The creativity parameter maps to the API’s temperature setting: lower creativity produces more conservative enhancements, higher values produce more creative interpretations.

If the API call fails, the enhancer falls back to rule-based enhancement.

Style Options

Six enhancement styles control the aesthetic direction:

Style	Effect
Auto-detect	Scans prompt keywords to choose the best style automatically
Photography	Sharp focus, natural lighting, photorealistic detail
Artistic	Painterly quality, expressive brushwork, creative composition
Realistic	Hyperrealistic rendering, fine detail, physical accuracy
Cinematic	Dramatic composition, film grain, anamorphic depth of field
Coloring Book	Clean line art, flat colors, no shading — designed for printable coloring pages

Creativity Level

The creativity parameter (0.0–1.0) controls enhancement intensity:

0.0–0.3 — conservative, minimal additions, stays close to the original prompt
0.4–0.6 — moderate, adds quality tags and style descriptors
0.7 (default) — balanced, good mix of original intent and enhancement
0.8–1.0 — creative, adds more descriptors and may reinterpret the prompt

For rule-based enhancement, creativity affects how many tags from each category are applied. For LLM-based methods, it maps to the model’s temperature parameter.

Integration with DiffusionJob

Enhancement runs as part of the image generation pipeline:

User creates a Prompt with a source prompt, enhancement method, style, and creativity
User creates a DiffusionJob linked to that prompt
When the job executes, the Celery task checks if the prompt has been enhanced
If enhancement is needed, it runs the selected method and saves the result to enhanced_prompt
The generation step uses enhanced_prompt if available, otherwise falls back to source_prompt

VRAM Management

The HuggingFace enhancer loads a 3–7 GB LLM into GPU memory. Since the diffusion model also needs GPU memory, the system manages this automatically:

The enhancer LLM is cached between enhancement tasks (warm cache)
Before loading a diffusion model, the worker calls _evict_enhancer() to free VRAM
This ensures enhancement and generation don’t compete for GPU memory
The Celery solo pool (single-threaded worker) guarantees sequential execution

Configuring Enhancement

When creating a Prompt in the admin UI:

Enter your Source Prompt
Set Enhancement Method to rule-based, HuggingFace, or LLM
Optionally set Enhancement Style (default: auto-detect)
Optionally adjust Creativity (default: 0.7)
Save — enhancement runs when the linked DiffusionJob executes

The enhanced prompt and generated negative prompt are saved to the Prompt record and visible on its detail page.