APOSTLE
arrow_back Google's AI Creative Suite
Module 01 Nano Banana Pro Fundamentals

Photorealism, Multi-Turn Editing, and Search Grounding

Learn how Nano Banana Pro differs from every other image generator and master the prompting techniques that unlock its full potential for professional creative work.

schedule 15 min
signal_cellular_alt Beginner
menu_book Lesson 01 of 5

Learning Objectives

By the end of this module, you will be able to:

  • Explain what makes Nano Banana Pro architecturally different from Midjourney or FLUX
  • Access Nano Banana Pro through both the Gemini app and Google AI Studio
  • Write prompts that leverage Nano Banana Pro's reasoning capabilities
  • Use multi-turn editing to refine images conversationally
  • Apply Google Search grounding for factually accurate visual content
  • Control output resolution, aspect ratio, and quality settings

What Makes Nano Banana Pro Different

Most AI image generators work like this: you type a prompt, a diffusion model interprets it statistically, and you get an image based on pattern matching against training data. The model doesn't "understand" your prompt — it maps words to visual patterns.

Nano Banana Pro works differently. It's built on the Gemini 3 Pro architecture — a multimodal language model that can REASON about images before generating them. This means it can plan compositions, understand spatial relationships, simulate physics, and follow complex multi-part instructions.

The practical differences you'll notice immediately:

1. It follows complex instructions. Ask Midjourney to "place a red coffee mug on the left side of a marble counter with the handle facing right, next to a folded newspaper with today's date visible" and you'll get... an approximation. Ask Nano Banana Pro the same thing and it will plan the spatial layout, orient the handle correctly, and render readable text on the newspaper.

2. It generates accurate text. Nano Banana Pro achieves roughly 94% text accuracy across languages. Logos, signage, labels, headlines, and multi-line text blocks render cleanly. This alone makes it the tool of choice for any commercial work involving text-in-image.

3. It edits conversationally. Generate an image, then say "move the plant to the right" or "change her shirt to blue" or "make it sunset instead of noon." Nano Banana Pro understands context across conversation turns and modifies the existing image rather than generating a new one from scratch.

4. It accepts up to 14 reference images. Upload character photos, product shots, brand guidelines, color palettes, and style references — all in a single prompt. The model considers all references simultaneously when generating.


Accessing Nano Banana Pro

Method 1: Gemini App (Easiest)

Open gemini.google.com or the Gemini mobile app.

  1. Start a new conversation
  2. Type your image generation prompt
  3. Gemini automatically uses Nano Banana Pro for image requests

For the highest quality output, select the "2.5 Pro with Deep Research" or "Gemini Pro" model in the model selector. The free tier gives you access to image generation with some daily limits.

Pro tip: In the Gemini app, you can toggle to "Create Images → Thinking" mode. This activates the reasoning pipeline — the model plans the composition before generating, producing more accurate results for complex scenes.

Method 2: Google AI Studio (Developer-Friendly)

Open aistudio.google.com

  1. Create a new prompt
  2. Select model: gemini-3-pro-image-preview (Nano Banana Pro) or gemini-3.1-flash-image-preview (Nano Banana 2)
  3. In the system instruction, add: Always respond with both text and images when the user asks for visual content.
  4. Set response modality to include Images
  5. Type your prompt and run

AI Studio is free and gives you direct control over model parameters, system instructions, and output settings. It's the best environment for experimentation before moving to the API.

Method 3: Gemini API (Production)

For programmatic access and integration into workflows:

from google import genai
from google.genai import types
from PIL import Image
import base64, io

client = genai.Client(api_key="YOUR_API_KEY")

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents="A photorealistic portrait of a woman in golden hour light, "
             "shot on Hasselblad X2D, 80mm f/2.8, warm film grain",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
        image_config=types.ImageConfig(
            number_of_images=1,
            aspect_ratio="3:4",
            image_size="2K",       # Options: 512, 1K, 2K, 4K
            output_mime_type="image/png"
        )
    )
)

# Extract and save the image
for part in response.candidates[0].content.parts:
    if part.inline_data is not None:
        img_bytes = base64.b64decode(part.inline_data.data)
        img = Image.open(io.BytesIO(img_bytes))
        img.save("output.png")
        print(f"Generated: {img.size[0]}x{img.size[1]}")

API keys are free from aistudio.google.com/apikey. Free tier includes generous daily quotas. Paid tier pricing starts at approximately $0.04 per image at 1K resolution and scales with size and model.


The Seven Principles of Nano Banana Pro Prompting

Google's own prompting research, combined with our production experience, reveals seven principles that produce the best results.

Principle 1: Think Like a Photographer, Not a Prompter

Nano Banana Pro was trained heavily on photographic datasets with rich EXIF metadata. It responds powerfully to the language of photography — camera models, lens specifications, lighting setups, film stocks, and compositional terms.

Generic prompt (weak):

A woman sitting at a café

Photographic prompt (strong):

A candid photograph of a woman in her late 20s sitting at a small round
marble café table in Paris. She's reading a paperback book, holding a
small espresso cup. Afternoon sidelight from a window creates soft shadows
across the table. Shot on Leica M11 with 35mm Summilux f/1.4, shallow
depth of field, Kodak Portra 400 color palette, natural skin texture,
editorial street photography style.

The second prompt gives the model a complete visual brief — subject, action, environment, lighting, camera, lens, depth of field, film stock, and style. Every technical term narrows the output toward a specific aesthetic.

Principle 2: Describe the Scene Narratively

Unlike Midjourney (which responds well to keyword lists), Nano Banana Pro performs best with natural language descriptions that tell a story. The model's language understanding is its strength — use full sentences, not comma-separated tags.

Keyword approach (less effective in Nano Banana):

woman, café, Paris, afternoon, reading, espresso, warm light, cinematic

Narrative approach (more effective):

A quiet afternoon in a Parisian café. A woman in her late 20s is absorbed
in a paperback novel, her espresso growing cold beside her. Warm afternoon
light filters through lace curtains, casting soft patterns on the marble
table. The scene feels intimate and unhurried, as if captured by a street
photographer who noticed this private moment.

The narrative gives the model emotional context and atmospheric direction that keywords alone cannot convey.

Principle 3: Specify Resolution and Aspect Ratio

Nano Banana Pro can generate up to native 4K (4096×4096), but you need to request it explicitly.

Via the API:

image_config=types.ImageConfig(
    aspect_ratio="16:9",    # Supported: 1:1, 2:3, 3:2, 3:4, 4:3,
                            # 4:5, 5:4, 9:16, 16:9, 21:9
    image_size="4K",        # Options: 512, 1K, 2K, 4K
)

Via prompt (in Gemini app): Add to any prompt: "Generate at 4K resolution in 16:9 aspect ratio."

Nano Banana 2 (the Flash variant) adds extreme aspect ratios: 1:4, 4:1, 1:8, 8:1 — useful for social media stories, banners, and panoramic content.

Cost consideration: 4K images cost roughly 2-3× more than 1K images via the API. Generate at 1K for iteration, then re-generate your final selects at 4K.

Principle 4: Use the Thinking Mode for Complex Scenes

When your prompt involves multiple elements, spatial relationships, or precise composition, activate the reasoning pipeline.

In the Gemini app: Select the model with "Thinking" capability.

Via API: Use the gemini-3-pro-image-preview model (Pro models include deeper reasoning by default).

Why it matters: For a prompt like "Three friends sitting around a campfire at dusk, the one on the left playing guitar, the middle one roasting a marshmallow, the one on the right laughing," the thinking mode will plan spatial positions, hand placements, and interaction points BEFORE generating. Without thinking mode, you're more likely to get merged limbs, incorrect positioning, or missing elements.

Principle 5: Layer in Technical Detail Progressively

Don't try to include every specification in a single prompt. Start broad, then refine:

Generation 1 (broad concept):

A luxury skincare product bottle on a white marble surface,
soft studio lighting, minimal composition, premium feel

Review output → identify what to refine

Generation 2 (add specifics via multi-turn edit):

Make the bottle glass with a frosted finish and gold cap.
Add subtle water droplets on the surface. Make the lighting
softer with a slight warm tint. The marble should have
thin grey veining.

Generation 3 (final polish):

Increase the depth of field so the background is slightly
blurred. Add a subtle reflection of the bottle on the marble
surface. Make the gold cap slightly more matte, less shiny.

This iterative approach produces better results than cramming everything into one enormous prompt, because each turn lets you evaluate and correct.

Principle 6: Use Negative Instructions Explicitly

Nano Banana Pro understands "do not" instructions reliably:

A professional headshot of a man in his 40s, clean-shaven,
wearing a navy suit and white shirt, against a light grey
background. Soft, even studio lighting.

Do not include: glasses, facial hair, tie, jewelry,
distracting background elements, text or watermarks.

This is more reliable than Midjourney's --no parameter because the language model processes negatives with actual comprehension rather than attempting to suppress diffusion patterns.

Principle 7: Anchor with Real-World References

Nano Banana Pro's training includes awareness of real publications, photography styles, and visual traditions. Naming them anchors the output to specific aesthetics:

Style anchors that work well:
- "In the style of Dwell magazine interior photography"
- "Kinfolk magazine editorial aesthetic"
- "National Geographic documentary photography"
- "Architectural Digest feature photography"
- "Vogue Italia fashion editorial"
- "Apple product photography style — minimal, clean, precise"
- "Wes Anderson color palette and symmetrical composition"

These references compress complex visual descriptions into immediately understood shorthand.


Multi-Turn Editing: Photoshop You Can Speak To

This is Nano Banana Pro's killer feature. After generating an image, you can modify it through conversation — and the model retains context about what it created.

Example multi-turn editing session:

Turn 1 (generate):
"A woman in a white linen dress standing in a field of lavender
at golden hour, Provence, France. Medium shot, 85mm lens,
shallow depth of field."

→ AI generates image

Turn 2 (adjust environment):
"Make the lavender field more dense and vibrant. Add distant
rolling hills in the background."

→ AI modifies the same image

Turn 3 (adjust subject):
"Change her dress to pale blue. Add a straw sun hat."

→ AI updates just the clothing, keeping everything else

Turn 4 (adjust lighting):
"Make the golden hour light warmer and more directional,
coming from the left. Add a subtle lens flare."

→ AI adjusts lighting across the entire image

Turn 5 (final polish):
"Add very subtle film grain, as if shot on Kodak Portra 400.
Slightly desaturate the overall color palette by about 10%."

→ AI applies the finishing treatment

Each turn builds on the previous. You're not regenerating from scratch — you're directing incremental changes to a persistent image. This is why practitioners call it "Photoshop you can speak to."

Multi-turn editing best practices:

  • Make ONE change per turn for maximum control. Multiple changes in one turn can cause unintended side effects.
  • If the model changes something you didn't ask for, immediately say: "Undo the last change to [specific element]. Keep everything else."
  • Add "Do not change anything else" to critical turns where you want surgical precision.
  • If the conversation drifts too far from the original, start a new session with the best image uploaded as a reference.

Google Search Grounding: Factually Accurate Visuals

A unique Nano Banana Pro capability: images can be grounded in real-time web data via Google Search.

What this enables:

"Create an infographic showing the current top 5 most valuable
companies in the world by market cap, with their actual logos
and real current values."

With Search grounding enabled, the model queries Google Search for current data, then generates the visual incorporating factual information. This is transformative for infographics, data visualizations, news graphics, and educational content.

Enabling Search grounding via API:

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents="Create a visual showing the current weather forecast "
             "for Tokyo this week with temperature highs and lows",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
        tools=[{"google_search": {}}]  # Enable search grounding
    )
)

Use cases for creatives:

  • Infographics with real, current data
  • Maps with accurate geographical information
  • Product comparisons with actual pricing
  • Event posters with correct dates and details
  • Educational visuals with factual accuracy

Practical Exercise

Exercise: Master the Multi-Turn Editing Workflow

  1. Open the Gemini app or Google AI Studio
  2. Generate a starting image with this prompt:
A cozy home office desk setup, shot from a 45-degree overhead angle.
Macbook Pro open on a wooden desk, ceramic coffee mug, small potted
succulent, notebook with a pen. Natural window light from the left.
Clean, minimal, Kinfolk magazine aesthetic. Shot on Fujifilm GFX 50S.
  1. Over the next 5 turns, make these sequential edits:

    • Turn 2: Change the time of day to evening — replace window light with warm desk lamp light
    • Turn 3: Add a pair of reading glasses next to the notebook
    • Turn 4: Change the mug from ceramic to a clear glass mug with tea
    • Turn 5: Apply a warmer color grade, as if shot on Kodak Gold 200 film
    • Turn 6: Generate at the highest available resolution
  2. Compare your final image to your starting image. Notice how each turn added a layer of specificity while maintaining the core composition.

This exercise builds the muscle of iterative direction — the same approach used in professional retouching, but using conversation instead of Photoshop tools.


Key Takeaways

  • Nano Banana Pro is architecturally different from diffusion-only tools like Midjourney — it reasons about images before generating, producing more accurate and controllable results.
  • Write prompts like a photographer, not a prompter. Camera specs, lighting descriptions, and film stock references produce dramatically better output.
  • Use narrative descriptions rather than keyword lists — the language model thrives on storytelling context.
  • Multi-turn editing is the killer feature. Generate once, then refine conversationally over 3-5 turns for precision control.
  • Google Search grounding enables factually accurate visual content — unique to this platform.
  • Generate at 1K for iteration, 4K for final delivery to manage costs while maintaining quality.

References & Resources

Copied to clipboard