Learning Objectives
By the end of this module, you will be able to:
- Upload and use multiple reference images (up to 14) in a single generation
- Build consistent AI characters that maintain identity across dozens of images
- Create brand asset packages (product shots, lifestyle imagery, campaign visuals) with visual consistency
- Use the JSON Visual DNA method to extract and replicate styles systematically
- Generate text-heavy commercial assets (packaging, signage, social posts) with accurate rendering
The 14-Reference Image System
Nano Banana Pro's most powerful professional feature: you can upload up to 14 reference images alongside your text prompt. The model considers ALL references simultaneously, enabling compound creative direction that no other consumer tool matches.
What you can combine in a single prompt:
Reference slots (up to 14 total):
├── Character references (faces, body, wardrobe) → up to 4-5 people
├── Product references (logos, packaging, items) → up to 10 objects
├── Style references (mood, color palette, grain) → 1-3 images
├── Environment references (location, lighting) → 1-3 images
└── Composition references (layout, framing) → 1 image
Example: Brand lifestyle shoot with 6 references
from google import genai
from google.genai import types
from PIL import Image
client = genai.Client(api_key="YOUR_KEY")
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=[
# Text prompt first
"Create a lifestyle photograph for this skincare brand. "
"The woman from Image 1 is sitting on the sofa from Image 2, "
"holding the product from Image 3. The lighting and color palette "
"should match Image 4. The composition should be similar to Image 5. "
"The overall mood should match Image 6 — warm, calm, aspirational. "
"Shot on Canon EOS R5, 50mm f/2.8, natural light from window.",
# Image references
Image.open("ref-character-headshot.png"), # Image 1
Image.open("ref-environment-sofa.png"), # Image 2
Image.open("ref-product-bottle.png"), # Image 3
Image.open("ref-lighting-palette.png"), # Image 4
Image.open("ref-composition-layout.png"), # Image 5
Image.open("ref-mood-warmth.png"), # Image 6
],
config=types.GenerateContentConfig(
response_modalities=["TEXT", "IMAGE"],
image_config=types.ImageConfig(
aspect_ratio="4:5",
image_size="2K"
)
)
)
Critical rule: Label your images explicitly. Always say "the woman from Image 1" or "the product in Image 3" — don't assume the model knows which reference maps to which role. Ambiguity causes the model to blend references inappropriately.
Building Consistent Characters
Character consistency — maintaining the same person's face, build, and features across multiple images — is the #1 professional challenge in AI image generation. Nano Banana Pro offers two approaches.
Method 1: Multi-Turn Conversation (Simplest)
Generate your character in Turn 1 with an extremely detailed description, then reference "the same person" in subsequent turns.
Turn 1 — Establish the character:
"Create a photorealistic close-up portrait of a woman named Elena.
She is 32 years old with an oval face, high cheekbones, wide-set
hazel eyes, a small mole on her left cheek just below the cheekbone,
short wavy dark-brown hair cut to chin length with a slight side part,
light olive skin with natural freckles across the bridge of her nose.
Neutral expression, looking directly at camera. Clean grey background.
Shot on Sony A7IV, 85mm f/1.8. Natural, editorial."
Turn 2 — Same character, new scene:
"Now show Elena sitting at a desk in a bright modern office. She's
reviewing papers, wearing a navy blazer over a white t-shirt. She
has a slight concentrating frown. Medium shot, natural window light
from the left. Same person, same facial features."
Turn 3 — Same character, different mood:
"Elena is now outdoors at a rooftop café during golden hour. She's
laughing genuinely, looking off-camera to the right. Wearing a
casual olive green linen shirt. Warm evening light. Same person."
Consistency tips for multi-turn:
- Repeat key distinguishing features in early turns (the mole, the freckle pattern, the chin-length hair)
- Add "Do not change facial features or proportions" to every turn
- If drift occurs, upload the best image from an earlier turn as a reference and say "This is what Elena looks like — maintain this exact face"
- Keep sessions to 5-8 turns maximum. Beyond that, start a new session with the best image as a reference.
Method 2: Identity Header Anchoring (Most Reliable)
Create a compact text header that defines your character precisely, then paste it into every new generation prompt.
The Identity Header format:
SUBJECT: Elena Marchetti
FACE: Oval shape, high cheekbones, wide-set hazel eyes with amber flecks,
small mole on left cheek below cheekbone, slightly arched dark brows,
medium-full lips, straight nose with narrow bridge
HAIR: Dark brown, wavy, chin-length bob with side part, natural texture
SKIN: Light olive, natural freckles across nose bridge, warm undertone
BUILD: Average, 5'7", slender athletic proportions
AGE APPEARANCE: Early 30s
Paste this header at the beginning of every prompt involving Elena. The redundancy is intentional — it forces the model to attend to specific features rather than drifting toward generic attractiveness.
Method 3: Multi-Reference Character Lock (Most Powerful)
Upload 3-4 images of the character from different angles and expressions. The model triangulates identity from multiple viewpoints.
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=[
# Upload 4 character references
Image.open("elena-front.png"), # Front-facing neutral
Image.open("elena-three-quarter.png"), # 3/4 angle
Image.open("elena-smiling.png"), # Different expression
Image.open("elena-profile.png"), # Side profile
# Prompt specifying new scene
"This is Elena — the same woman shown in all four reference images. "
"Generate a new photograph of Elena standing at a kitchen counter "
"making coffee. She's wearing a cream cable-knit sweater. Morning "
"light from a window on her left. She looks relaxed and content. "
"Maintain her exact facial features, hair, and skin from the references. "
"Medium shot, Fujifilm GFX 50S, 55mm f/2.8."
],
config=types.GenerateContentConfig(
response_modalities=["TEXT", "IMAGE"],
image_config=types.ImageConfig(aspect_ratio="4:5", image_size="2K")
)
)
Using 3-4 references from different angles gives the model enough information to reconstruct the character accurately in any new pose or environment. This approach outperforms single-reference methods for extended projects.
Creating Brand Asset Packages
For commercial work, you need not just consistent characters but consistent brand aesthetics — color palettes, lighting styles, material textures, and compositional language that maintain brand identity across dozens of assets.
The Brand Style Guide Method
Create a master prompt prefix that encodes your brand's visual DNA:
BRAND: Origin Coffee
AESTHETIC: Warm minimal Scandinavian, Kinfolk magazine meets Cereal magazine
PALETTE: Cream (#F5F0EB), Warm Oak (#C4A882), Espresso (#3C2415),
Sage (#8B9A82), Morning Gold (#F9E4B7)
MATERIALS: White oak, marble with grey veining, matte ceramic, linen
LIGHTING: Always natural light — morning sun preferred, soft and directional
MOOD: Calm, intentional, unhurried, premium-but-approachable
CAMERA: Fujifilm GFX 50S, 55mm f/2.8 (default), Kodak Portra color science
AVOID: Harsh shadows, cluttered compositions, saturated colors, plastic textures
Paste this prefix before every asset generation prompt. This ensures every image — product shots, lifestyle scenes, social media graphics — shares the same visual language.
Product Photography Workflow
Prompt 1 — Hero product shot:
[Brand style guide prefix]
"Origin Coffee 250g bag standing upright on a white marble surface.
The bag is kraft paper with a minimal black label reading 'ORIGIN'
in clean sans-serif type. A small ceramic cup of black coffee sits
beside it, steam visible. Soft morning light from the left. Shallow
depth of field. Product photography, clean, editorial."
Prompt 2 — Lifestyle context:
[Brand style guide prefix]
"The Origin Coffee bag from the previous image, now placed on a
kitchen counter next to a ceramic pour-over. A person's hands
are visible pouring water from a gooseneck kettle. Warm morning
light, out-of-focus kitchen background. Lifestyle product photography."
Prompt 3 — Social media flat lay:
[Brand style guide prefix]
"Flat lay overhead shot: Origin Coffee bag centered, surrounded by
scattered coffee beans, a ceramic mug, a small linen napkin, and
a sprig of dried eucalyptus. White oak surface. Even soft lighting,
no harsh shadows. Instagram flat lay style, 1:1 square aspect ratio."
Each image maintains the same brand DNA because the style guide prefix anchors the visual direction.
The JSON Visual DNA Method
For maximum precision and reusability, extract the visual parameters of any reference image as structured JSON, then feed that JSON to Nano Banana Pro for style replication.
Step 1: Extract Visual DNA
Upload a reference image to Gemini and prompt:
Analyze this image and extract its visual characteristics as a JSON
context profile. Include these categories: scene (environment, background,
surface), camera (focal_length, aperture, focus, distance), lighting
(type, direction, quality, color_temperature), color_grading (palette
with hex codes, saturation, warmth), mood (3-5 keywords), and
imperfections (grain, vignette, lens artifacts).
Example output:
{
"scene": {
"environment": "indoor_studio",
"background": "seamless_paper",
"background_color": "#E8E2DA",
"surface": "white_oak_table",
"props": ["ceramic_vase", "dried_flowers"]
},
"camera": {
"focal_length_mm": 85,
"aperture": "f/2.0",
"focus": "selective_center",
"perspective": "eye_level",
"distance": "medium_close"
},
"lighting": {
"primary": "window_natural",
"direction": "upper_left_45deg",
"quality": "soft_diffused",
"fill": "subtle_bounce",
"color_temperature_K": 5200,
"contrast": "low_medium"
},
"color_grading": {
"palette": ["#F5F0EB", "#C4A882", "#8B8680", "#3C2415", "#E8D5C4"],
"saturation": "slightly_desaturated",
"warmth": "warm",
"blacks": "slightly_lifted",
"highlights": "soft_roll_off"
},
"mood": ["calm", "minimal", "premium", "natural"],
"imperfections": {
"grain": "very_fine",
"vignette": "subtle_10pct",
"chromatic_aberration": "none"
}
}
Step 2: Generate from the JSON
Generate an image using this visual DNA profile:
[paste the JSON]
Subject: A woman in her 30s sitting at a desk writing in a notebook.
She wears a cream linen shirt. The desk has a small ceramic cup of tea.
Apply ALL parameters from the JSON exactly — match the lighting direction,
color palette, camera settings, grain, and mood precisely.
Why This Matters for Production
JSON Visual DNA profiles are reusable brand assets. Create one profile per brand, per campaign, or per visual world. Anyone on your team can generate on-brand imagery by pasting the JSON — no subjective interpretation required. This is how you scale AI image production without losing consistency.
Text-Heavy Commercial Assets
Nano Banana Pro's text rendering makes it the best current tool for assets that include readable text.
Signage and Environmental Text
A photograph of a boutique coffee shop storefront. The shop sign
above the door reads "ORIGIN" in clean black sans-serif capital
letters on a pale wood panel. The door is painted forest green.
Window displays show minimal product arrangements. Afternoon
sun casts soft shadows. Street photography style.
Social Media Graphics with Text
A minimalist Instagram post graphic for Origin Coffee.
Clean cream background (#F5F0EB). Large centered text reading:
"Begin with better." in a modern serif font, dark espresso
brown (#3C2415). Below the text, a small simple line illustration
of a coffee cup. 1:1 square format. Elegant, premium, quiet.
Product Packaging Mockups
A realistic mockup of a coffee bag label. The label is kraft paper
texture. At the top: "ORIGIN" in black sans-serif capitals, tracking
slightly wide. Below: "Single Origin Colombia" in smaller text.
At the bottom: "Net Wt. 250g" and a small origin certification mark.
The typography is clean, modern, minimal. Photorealistic rendering
of printed text on paper texture.
Text rendering tips:
- Keep text to 1-3 lines for highest accuracy
- Specify the exact text in quotation marks
- Describe the font style (serif, sans-serif, monospace, script)
- Short text (1-5 words) renders much more reliably than paragraphs
- For critical text accuracy, generate at 2K+ resolution
- If text is slightly wrong, use multi-turn editing: "Fix the text to read exactly: ORIGIN"
Practical Exercise
Exercise: Create a 5-Image Brand Asset Package
Choose a fictional brand (or use the Origin Coffee example). Create:
- Brand Style Guide prompt prefix — palette, materials, lighting, mood, camera
- Hero product shot — the product alone, beautifully lit
- Lifestyle shot — the product in use, with a person
- Flat lay — overhead, styled arrangement
- Social media graphic — with brand name text rendered
- Storefront/signage — the brand name in an environmental context
Use the same Brand Style Guide prefix for all 5 images. Evaluate: do they feel like they belong to the same brand? Where does consistency break? What would you adjust?
Key Takeaways
- 14 reference images can be used simultaneously — label each explicitly ("the woman from Image 1")
- Character consistency works best with 3-4 reference photos from different angles, plus a detailed Identity Header
- Brand Style Guide prefixes ensure every generated asset shares the same visual DNA — paste before every prompt
- JSON Visual DNA profiles are reusable, shareable brand assets that enable team-scale consistency
- Text renders at ~94% accuracy — best for short text, specify exact content in quotes, fix errors via multi-turn editing
- Multi-turn editing lets you refine progressively — one change per turn for maximum control
References & Resources
- Google AI: Nano Banana Image Generation Docs
- Google DeepMind: Nano Banana Pro Technical Overview
- Towards Data Science: Generating Consistent Imagery with Gemini
- Apiyi Blog: Gemini 14 Reference Images Guide
- Pinterest board — Brand Identity Systems: https://pinterest.com/search/pins/?q=brand%20identity%20system%20design
Inquiry