APOSTLE
arrow_back AI Video Production Pipeline
Module 02 Pre-Production

Pre-Production — Script, Storyboard & Ingredient Creation

Learn how to create every ingredient your AI video production needs — from AI-optimized scripts and character reference packages to environment libraries and composed keyframes.

schedule 18 min
signal_cellular_alt Intermediate
menu_book Lesson 02 of 6

Estimated time: 18 minutes What you'll learn: How to create every ingredient your AI video production needs before you generate a single frame of video. Tools used: ChatGPT/Claude (scripting), Midjourney V7 (character/environment), Nano Banana Pro (composition/editing), FLUX (photorealism)


Learning Objectives

By the end of this module, you will be able to:

  • Write an AI-optimized script with shot-level descriptions that translate directly to prompts
  • Create character reference packages that maintain identity across shots
  • Build environment reference libraries that lock down your visual world
  • Compose start-frame keyframes that combine characters and environments into ready-to-animate images
  • Organize all ingredients into a production bible

Step 1: Write an AI-Optimized Script

An AI video script is different from a traditional screenplay. Traditional scripts describe dialogue and action for human actors and crew. AI video scripts describe visual compositions for image and video generation tools.

The format you need: Shot Description Script (SDS)

Each shot gets its own block with five components:

SHOT [number] — [duration]s
SCENE: [location/environment description]
ACTION: [what happens in the shot — movement, gesture, expression]
CAMERA: [camera angle, movement, lens equivalent]
AUDIO: [dialogue, music cue, SFX]
TOOL: [which AI video model to use — filled in after Module 3]

Example: Coffee Brand Commercial (30 seconds)

SHOT 1 — 4s
SCENE: Sunlit kitchen counter, marble surface, morning light from left window,
       minimal styling, single monstera plant in background
ACTION: Hands pour coffee from a ceramic pour-over into a white mug, steam rises
CAMERA: Close-up, overhead angle (bird's eye), static with subtle zoom in
AUDIO: Sound of pouring liquid, gentle ambient morning sounds
TOOL: Kling 2.6 (precise hand/object control)

SHOT 2 — 3s
SCENE: Same kitchen, wider view revealing breakfast setup
ACTION: Woman (character: Maya) picks up the mug, brings it to her lips, closes eyes
CAMERA: Medium shot, eye level, slight dolly left
AUDIO: Continued ambient, soft piano note enters
TOOL: Veo 3.1 (character consistency + cinematic quality)

SHOT 3 — 5s
SCENE: Living room with large window, city skyline visible, morning golden light
ACTION: Maya walks to the window holding the mug, looks out, subtle smile
CAMERA: Full body, tracking shot following her walk, 35mm equivalent
AUDIO: Piano melody builds gently, city ambient underneath
TOOL: Veo 3.1 (tracking movement + audio sync)

SHOT 4 — 3s
SCENE: Product shot — the coffee bag on marble, cup beside it, steam
ACTION: Static product display, steam movement only
CAMERA: Close-up, 45-degree angle, shallow depth of field
AUDIO: Music resolves, brand tagline in VO: "Begin with better."
TOOL: Kling 2.6 (product precision)

SHOT 5 — 2s
SCENE: White/brand color background
ACTION: Brand logo animation
CAMERA: Static center frame
AUDIO: Brand audio logo / stinger
TOOL: After Effects (traditional — not AI generated)

Using LLMs to write and refine scripts:

Start with a brief to ChatGPT or Claude:

Write an AI video shot description script for a 30-second coffee brand commercial.
The brand is called "Origin" — premium, minimal, urban. Target audience is
professionals aged 28-40. The narrative follows a morning ritual. Each shot
should include SCENE, ACTION, CAMERA, and AUDIO fields. Keep it to 4-5 shots.
The total runtime must be exactly 30 seconds.

Then refine iteratively: "Make shot 3 longer and more cinematic." "Add a product close-up." "Make the camera movements more dynamic."

The LLM won't write perfect AI prompts, but it will write excellent creative briefs that you then translate into prompts. Think of this step as working with a scriptwriter, not a prompt engineer.


Step 2: Create Character Reference Packages

Character references are the most important ingredients in your pipeline. Without them, you have no consistency. With them, you can maintain a recognizable character across 50+ shots.

The Four-Image Character Package

At minimum, create four reference images per character:

Image 1: Front-Facing Headshot (Neutral)

This is your anchor image — the one you'll use as a reference input for every shot.

Midjourney prompt:
Front-facing portrait photograph of a woman in her early 30s, East Asian
features, shoulder-length dark brown hair with subtle highlights, warm brown
eyes, light makeup, neutral relaxed expression, wearing a cream linen shirt,
clean light grey background, soft diffused studio lighting, shot on Canon
EOS R5 with 85mm f/1.4 lens, natural skin texture with visible pores
--ar 2:3 --style raw --s 200 --q 2

Why these choices matter:

  • --style raw removes Midjourney's beautification filter — critical for realistic character references
  • --s 200 (low stylize) keeps the output closer to your prompt rather than Midjourney's aesthetic preferences
  • 85mm f/1.4 triggers flattering portrait compression with natural bokeh
  • natural skin texture with visible pores prevents the waxy AI skin look
  • neutral relaxed expression gives you a baseline face that can be modified for different emotions
  • clean light grey background isolates the character for easier use as a reference

Image 2: Full-Body Reference

Midjourney prompt:
Full-body photograph of [same character description], standing in a relaxed
contrapposto pose, wearing cream linen shirt, dark navy tailored trousers,
white sneakers, hands loosely at sides, clean light grey background, even
studio lighting, shot on Sony A7IV with 50mm f/2.8 lens, fashion editorial
style --ar 2:3 --style raw --s 200 --q 2

Image 3: Expression Sheet

Midjourney prompt:
Expression sheet of [same character description], four headshots in a 2x2
grid showing: top-left genuine smile with teeth, top-right thoughtful
expression looking slightly left, bottom-left laughing candidly,
bottom-right calm contemplative with closed lips, consistent lighting
and background across all four, character reference sheet style
--ar 1:1 --style raw --s 150 --q 2

Image 4: Outfit/Wardrobe Reference

If your character wears different outfits across scenes, generate each outfit:

Midjourney prompt:
Full-body fashion photograph of [same character description], wearing a
forest green cashmere sweater over white t-shirt, dark indigo jeans, brown
leather boots, same person different outfit, clean background, studio
lighting --ar 2:3 --style raw --s 200 --q 2

Using Omni Reference (--oref) for Character Lock

Once you have your anchor headshot, use Midjourney V7's Omni Reference to maintain that face in new scenes:

[Upload headshot as Omni Reference]
The woman from the reference photo sitting at a café table by a rain-streaked
window, holding a ceramic mug, afternoon light, 35mm street photography
--ow 200 --s 300 --ar 16:9

The --ow 200 weight provides strong facial fidelity while leaving enough flexibility for the new scene. Start at 100 and increase if the face drifts too much. Go above 400 only if you're using high --stylize values.

Cross-Tool Character Transfer

Your Midjourney character reference also works in other tools:

In Nano Banana Pro: Upload the headshot and prompt: "Generate a new image of this exact person [uploaded reference] sitting in a bright modern kitchen, morning sunlight from the left, medium shot, photorealistic."

In FLUX (via ComfyUI or API): Use IP-Adapter with the headshot as the identity reference. Set IP-Adapter weight to 0.7-0.85 for natural results.

For video generation (covered in Module 4): The composed keyframes you create in Step 4 already contain your character — the video model animates from that image rather than generating a new character from text.


Step 3: Build Environment Reference Libraries

Environments establish the visual world of your project. Like character references, they ensure consistency across shots that share a location.

Environment Reference Checklist

For each location in your script, create:

Reference Type Purpose Example
Wide establishing shot Sets the overall space, scale, and atmosphere Full kitchen view, morning light
Detail textures Surface materials, patterns, finishes Marble countertop close-up, wood grain
Lighting reference Direction, color temperature, shadow quality Window light study, golden hour reference
Color palette 5-7 hex codes defining the location's palette Extracted from the establishing shot
Prop details Specific objects that appear in shots The specific coffee mug, the pour-over device

Example: Creating a Kitchen Environment

Establishing shot:

Nano Banana Pro prompt:
A bright modern Scandinavian kitchen, white oak cabinets, white marble
countertops with subtle grey veining, large window on the left wall letting
in soft morning sunlight, monstera plant on the counter, minimal styling,
single ceramic pour-over coffee setup, warm wood flooring, the feeling of
a calm morning, interior photography by dwell magazine, shot on Fujifilm
GFX 50S with 32mm f/4 lens

Why Nano Banana Pro for environments:

  • Native 4K resolution captures fine detail (marble veining, wood grain)
  • Reasoning-guided synthesis produces more physically accurate lighting
  • Multi-turn editing means you can adjust specific elements ("move the plant to the right," "make the light warmer")

Extracting a color palette:

Take your establishing shot and feed it to Claude or ChatGPT:

Analyze this image and extract the 6 dominant colors as hex codes.
For each color, name it and note where it appears in the image.

The output might look like:

#F5F0EB — Warm White (walls, ceiling)
#C4A882 — Oak Honey (cabinets, flooring)
#8B8680 — Warm Grey (marble veining, shadows)
#2D4A3E — Deep Forest (monstera leaves)
#E8D5C4 — Cream (linen textiles, ceramic)
#F9E4B7 — Morning Gold (sunlight patches)

Save these hex codes. Include them in your prompts for other shots in this location to maintain color consistency: "...using a warm palette of cream (#E8D5C4), oak honey (#C4A882), and morning gold (#F9E4B7)."


Step 4: Compose Keyframes (Start Frames)

Keyframes are the bridge between preparation and generation. Each keyframe is a fully composed image that serves as the first frame of a video clip. The AI video tool then animates from this starting point.

The Keyframe Composition Workflow

Character Reference + Environment Reference + Composition Direction
                            ↓
                  Composed Keyframe Image
                            ↓
              Input for Image-to-Video Generation

The best tool for composing keyframes: Nano Banana Pro (Gemini)

Why? Because you can upload multiple reference images and describe exactly how to combine them in natural language. Midjourney's --oref is powerful but limited to one reference. Nano Banana Pro accepts up to 14 references.

Example keyframe composition:

Nano Banana Pro prompt (with 3 uploaded images):

Image 1: [Character headshot — Maya]
Image 2: [Kitchen establishing shot]
Image 3: [Composition reference — medium shot of someone at a kitchen counter]

"Create a photograph of the woman from Image 1 standing at the kitchen
counter from Image 2. She is pouring coffee from a ceramic pour-over
into a white mug. The composition should match Image 3 — a medium shot
from the waist up, camera slightly below eye level. She is looking down
at the pour with a calm, focused expression. Morning light from the left
window creates a soft glow on the left side of her face. The steam from
the coffee is visible. Shot on Canon EOS R5, 50mm f/2.8, natural warm
color grading. Do not change her facial features from Image 1."

Keyframe Quality Checklist

Before moving a keyframe to the generation phase, verify:

  • Character face matches the reference (same person, not a similar person)
  • Environment matches established references (same kitchen, same lighting)
  • Composition is exactly what you want for frame 1 of the video
  • Camera angle/height is correct (the video tool will animate FROM this angle)
  • Hands and body position are natural and correct (fewer fingers to fix in video)
  • Lighting direction is consistent with other shots in the same scene
  • Color palette is consistent with your extracted hex codes
  • Resolution is high enough (minimum 1024×1024, ideally 2K+)

Creating End Frames (Optional but Powerful)

Some AI video tools (Kling 2.6, Runway) support start-frame AND end-frame inputs. This gives you precise control over the motion path.

For a shot where the character turns from the counter to face the camera:

  • Start frame: Character facing the counter, in profile
  • End frame: Same character, same environment, now facing the camera with a smile

The video tool interpolates the motion between them. This is dramatically more controllable than text-based motion descriptions.


Step 5: Organize Your Production Bible

Before generating any video, organize all ingredients into a structured production bible. This is your single source of truth for the entire project.

Recommended folder structure:

/project-name/
  /01-script/
    shot-description-script.md
    dialogue-script.md (if applicable)
  /02-characters/
    /maya/
      maya-headshot-front.png
      maya-full-body.png
      maya-expressions.png
      maya-outfit-2.png
      maya-character-notes.md (physical description, personality notes)
    /character-2/
      ...
  /03-environments/
    /kitchen/
      kitchen-establishing.png
      kitchen-detail-marble.png
      kitchen-color-palette.md
      kitchen-lighting-ref.png
    /living-room/
      ...
    /exterior/
      ...
  /04-keyframes/
    shot-01-keyframe.png
    shot-01-endframe.png (if using)
    shot-02-keyframe.png
    shot-03-keyframe.png
    shot-04-keyframe.png
  /05-audio/
    music-reference.mp3
    vo-script.md
    sfx-list.md
  /06-generated-clips/
    (empty — filled during Phase 2)
  /07-final/
    (empty — filled during Phase 3)
  production-notes.md

The production-notes.md file should contain:

  • Project brief (client, objective, deliverables, deadlines)
  • Color palette hex codes
  • Typography choices (for any text overlays)
  • Aspect ratio requirements per platform
  • Which AI tools + parameter settings worked best per shot
  • Any brand guidelines or restrictions

Practical Exercise

Exercise: Build a Mini Production Bible

Choose one of these 15-second scenarios:

A) A person opening a gift box and reacting with surprise B) A product being placed on a table with dramatic lighting C) A person walking through a door from darkness into bright light

Then:

  1. Write a 3-shot SDS (Shot Description Script) using the format from Step 1
  2. Generate 1 character reference (headshot) in Midjourney or Nano Banana Pro
  3. Generate 1 environment reference in any image tool
  4. Compose 1 keyframe that combines your character and environment
  5. Extract a 5-color palette from your environment reference

You don't need to generate video yet — that's Module 4. The goal is to practice the PREPARE phase and experience how much creative control you gain before ever touching a video generation tool.

Time estimate: 45-60 minutes for the full exercise.


Key Takeaways

  • An AI-optimized script describes visual compositions, not dialogue and stage directions. Use the Shot Description Script format: Scene, Action, Camera, Audio, Tool.
  • Every character needs at minimum four reference images: headshot, full body, expression sheet, and alternative outfit. These are your consistency anchors.
  • Environments need establishing shots, detail textures, lighting references, and extracted color palettes. Consistency comes from preparation.
  • Keyframes are the bridge between preparation and generation. A composed keyframe combining character + environment + composition gives the AI video tool a clear starting point.
  • Nano Banana Pro excels at keyframe composition because it accepts multiple reference images and responds to natural language direction.
  • Organize everything in a production bible before generating. This structure is the difference between a professional pipeline and random experimentation.

References & Resources


Next up: Module 3: Multi-Model Routing — Which AI Video Tool for Which Shot →

Copied to clipboard