Data Schemas (JSON)

Field-by-field specification for all JSON data files under data/.

presets.json

Top-level structure with two arrays:

{ "models": [...], "loras": [...] }

See Configuration for the full model and LoRA field reference.

llm_models.json

LLM model configurations for text generation.

Field	Type	Required	Description
`model_id`	string	Yes	HuggingFace identifier (e.g., `Qwen/Qwen2.5-7B-Instruct`)
`name`	string	Yes	Display name (e.g., “Qwen 2.5 7B”)
`notes`	string	No	Capabilities or use-case notes
`is_active`	bool	Yes	Whether model is available for selection
`load_in_4bit`	bool	Yes	Whether to use 4-bit quantization

Example:

{
  "model_id": "Qwen/Qwen2.5-7B-Instruct",
  "name": "Qwen 2.5 7B",
  "notes": "Strong multilingual, best for CJK languages",
  "is_active": true,
  "load_in_4bit": true
}

regions.json

Geographic regions for adaptation targeting.

Field	Type	Required	Description
`code`	string	Yes	Region code (e.g., `NA`, `LATAM`, `DACH`)
`name`	string	Yes	Full name (e.g., “North America”)
`description`	string	No	Regional description
`insights`	array	Yes	`[{heading, points[]}]` — cultural patterns and media environment
`is_active`	bool	Yes	Whether region is available

countries.json

Countries with default language references.

Field	Type	Required	Description
`code`	string	Yes	ISO 3166-1 alpha-2 code (e.g., `US`, `GB`, `DE`)
`name`	string	Yes	Full country name
`default_language`	string	Yes	FK to `languages.json[].code` (e.g., `en-US`)
`insights`	array	Yes	`[{heading, points[]}]` — regulatory and cultural considerations
`notes`	string	No	Additional notes
`is_active`	bool	Yes	Whether country is available

languages.json

Languages with LLM model references.

Field	Type	Required	Description
`code`	string	Yes	IETF BCP 47 code (e.g., `en-US`, `de-AT`, `zh-CN`)
`name`	string	Yes	Full name with region (e.g., “English (United States)”)
`base_language`	string	Yes	2-letter base code (e.g., `en`, `de`)
`primary_model`	string	Yes	FK to `llm_models.json[].model_id`
`alternative_models`	array	Yes	Array of alternative `model_id` strings (can be empty)
`insights`	array	Yes	`[{heading, points[]}]` — language characteristics and localization guidance
`notes`	string	No	Additional notes
`is_active`	bool	Yes	Whether language is available

country_regions.json

Many-to-many mapping between countries and regions.

Field	Type	Required	Description
`country_code`	string	Yes	FK to `countries.json[].code`
`region_code`	string	Yes	FK to `regions.json[].code`

A country can belong to multiple regions (e.g., Australia maps to both ANZ and APAC).

country_languages.json

Many-to-many mapping between countries and languages.

Field	Type	Required	Description
`country_code`	string	Yes	FK to `countries.json[].code`
`language_code`	string	Yes	FK to `languages.json[].code`
`is_primary`	bool	Yes	Whether this is the country’s primary language

Multilingual countries have multiple entries with typically one marked is_primary: true.

prompt_templates.json

Jinja2 prompt templates for LLM interactions.

Field	Type	Required	Description
`slug`	string	Yes	Unique identifier (e.g., `adaptation`, `eval-brand`)
`name`	string	Yes	Display name
`category`	string	Yes	One of: `adaptation`, `enhancement`, `concept`, `evaluation`
`description`	string	Yes	What the template does
`template`	string	Yes	Jinja2 template body with `{{ variables }}`
`expected_variables`	object	Yes	Variable schema (currently `{}`)
`version`	int	Yes	Auto-incrementing version number

See Prompt Template Editing for template editing and versioning.

segments.json

Audience segments across three categories.

Field	Type	Required	Description
`category`	string	Yes	`DEMOGRAPHIC`, `BEHAVIORAL`, or `PSYCHOGRAPHIC`
`vector`	string	Yes	Segmentation dimension (e.g., “Household Income”, “Usage Pattern”)
`value`	string	Yes	Position on the dimension (e.g., “Middle-Income”, “First-Time Users”)
`description`	string	No	Segment description
`insights`	array	No	`[{heading, points[]}]` — advertising approach guidance
`is_active`	bool	Yes	Whether segment is available

The composite (category, vector, value) must be unique.

personas.json

Audience personas combining geographic and segment targeting.

Field	Type	Required	Description
`name`	string	Yes	Persona name (e.g., “Budget-Conscious First-Timer”)
`description`	string	No	Persona description
`region_code`	string	No	FK to `regions.json[].code`
`country_code`	string	No	FK to `countries.json[].code`
`language_code`	string	No	FK to `languages.json[].code`
`is_active`	bool	Yes	Whether persona is available

persona_segments.json

Many-to-many mapping between personas and segments.

Field	Type	Required	Description
`persona_name`	string	Yes	FK to `personas.json[].name`
`segment_category`	string	Yes	Segment category (e.g., `DEMOGRAPHIC`)
`segment_vector`	string	Yes	Segment vector (e.g., “Household Income”)
`segment_value`	string	Yes	Segment value (e.g., “Middle-Income”)
`order_index`	int	No	Display ordering within the persona

brands.json

Brand reference data for the brand evaluation gate.

Field	Type	Required	Description
`code`	string	Yes	Unique identifier (e.g., `LAYS`, `WALKERS`)
`name`	string	Yes	Display name
`description`	string	No	Brand overview and positioning
`guidelines`	string	No	Free-text brand voice, values, visual identity rules
`insights`	array	No	`[{heading, points[]}]` — structured brand patterns
`is_active`	bool	Yes	Whether brand is available

See Brand Configuration for usage in the evaluation gate.

example_tvspot.json

TV spot import format used by import_tvspot.

Top-Level Fields

Field	Type	Required	Description
`client_name`	string	Yes	Client or advertiser name
`brand_name`	string	Yes	Brand name
`script_title`	string	Yes	Campaign or script title
`total_runtime_seconds`	number	Yes	Total spot duration in seconds
`job_id`	string	Yes	Unique job identifier (e.g., `ACME-2024-001`)
`language`	string	Yes	Language code (e.g., `en-US`)
`notes`	string	No	Production notes
`script_rows`	array	Yes	Array of script row objects

Script Row Fields

Field	Type	Required	Description
`shot_number`	string	Yes	Shot identifier (e.g., `01`, `02`)
`timecode_start`	string	Yes	Timecode in `HH:MM:SS:FF` format
`duration_seconds`	number	Yes	Shot duration in seconds
`visual_text`	string	Yes	Scene description (framing, setting, talent, on-screen text)
`audio_text`	string	Yes	Audio specification (SFX, MUSIC, VO, DIALOGUE)

See Importing TV Spots (JSON) for import instructions and validation rules.

Insights Schema

Multiple models share a common insights structure:

[
  {
    "heading": "Section Title",
    "points": [
      "First point",
      "Second point"
    ]
  }
]

Used by: Region, Country, Language, Segment, and Brand models. The compose_insights() function in src/cw/lib/insights.py aggregates insights from multiple sources in hierarchical order.

Relationship Diagram

        erDiagram
    LLM_MODELS ||--o{ LANGUAGES : "primary_model"
    REGIONS ||--o{ COUNTRY_REGIONS : ""
    COUNTRIES ||--o{ COUNTRY_REGIONS : ""
    COUNTRIES ||--o{ COUNTRY_LANGUAGES : ""
    LANGUAGES ||--o{ COUNTRY_LANGUAGES : ""
    COUNTRIES }o--|| LANGUAGES : "default_language"
    PERSONAS }o--o| REGIONS : "region_code"
    PERSONAS }o--o| COUNTRIES : "country_code"
    PERSONAS }o--o| LANGUAGES : "language_code"
    PERSONAS ||--o{ PERSONA_SEGMENTS : ""
    SEGMENTS ||--o{ PERSONA_SEGMENTS : ""