APOSTLE
arrow_back AI Creative Director
Module 03 Gemini Image Generation

Gemini API Setup and Image Generation

Set up the Gemini API for image generation, learn the available models, and write Python code for generating and editing images.

schedule 16 min
signal_cellular_alt Intermediate
menu_book Lesson 05 of 14

Gemini API Setup and Image Generation

Available Models

Model Strengths Best For
gemini-3.1-flash Fast, cost-effective, good text rendering Rapid prototyping, high-volume generation, text-in-image
gemini-3-pro Highest quality, best instruction following Hero shots, complex scenes, professional output
gemini-2.5-flash Balanced speed/quality, strong reasoning Multi-step editing, analytical tasks with visual output

Setup

Install the Google Generative AI Python SDK:

pip install google-generativeai

Set your API key:

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

Security note: Never hardcode API keys. Use environment variables: genai.configure(api_key=os.environ["GEMINI_API_KEY"])


Basic Image Generation

import google.generativeai as genai
from PIL import Image
import io
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

model = genai.GenerativeModel("gemini-3.1-flash")

response = model.generate_content(
    "Generate an image: A minimalist product photograph of a white "
    "ceramic vase on a marble surface, soft directional window light "
    "from camera-left, long shadow, neutral background, shot on "
    "Hasselblad X2D",
    generation_config=genai.GenerationConfig(
        response_mime_type="image/png",
    ),
)

# Save the generated image
image_data = response.candidates[0].content.parts[0].inline_data.data
image = Image.open(io.BytesIO(image_data))
image.save("output.png")
print("Image saved to output.png")

Image Editing (Image + Text Input)

import google.generativeai as genai
from PIL import Image
import io
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

model = genai.GenerativeModel("gemini-3-pro")

# Load the source image
source_image = Image.open("source.png")

response = model.generate_content(
    [
        "Edit this image: Change the background to a sunset beach scene "
        "while keeping the subject exactly the same. Maintain the same "
        "lighting direction but add warm golden-hour color grading.",
        source_image,
    ],
    generation_config=genai.GenerationConfig(
        response_mime_type="image/png",
    ),
)

image_data = response.candidates[0].content.parts[0].inline_data.data
edited_image = Image.open(io.BytesIO(image_data))
edited_image.save("edited.png")
print("Edited image saved to edited.png")

Multi-Turn Editing (Conversational Refinement)

This is Gemini's killer feature — edit images through conversation:

import google.generativeai as genai
from PIL import Image
import io
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

model = genai.GenerativeModel("gemini-3-pro")
chat = model.start_chat()

# Turn 1: Generate the base image
response = chat.send_message(
    "Generate an image: A confident businesswoman in a navy suit "
    "standing in a modern office lobby with floor-to-ceiling windows, "
    "shot on Canon EOS R5, natural light",
    generation_config=genai.GenerationConfig(
        response_mime_type="image/png",
    ),
)
# Save turn 1
img1 = Image.open(io.BytesIO(
    response.candidates[0].content.parts[0].inline_data.data
))
img1.save("turn1.png")

# Turn 2: Refine the lighting
response = chat.send_message(
    "Make the lighting more dramatic — add Rembrandt lighting with "
    "stronger shadows on the right side of her face",
    generation_config=genai.GenerationConfig(
        response_mime_type="image/png",
    ),
)
img2 = Image.open(io.BytesIO(
    response.candidates[0].content.parts[0].inline_data.data
))
img2.save("turn2.png")

# Turn 3: Adjust color grading
response = chat.send_message(
    "Apply a teal and orange color grade, desaturate slightly, "
    "add subtle film grain",
    generation_config=genai.GenerationConfig(
        response_mime_type="image/png",
    ),
)
img3 = Image.open(io.BytesIO(
    response.candidates[0].content.parts[0].inline_data.data
))
img3.save("turn3.png")
print("All turns saved.")

Reference Images (Up to 14)

Gemini can accept up to 14 reference images in a single prompt:

import google.generativeai as genai
from PIL import Image
import io
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

model = genai.GenerativeModel("gemini-3-pro")

# Load reference images
style_ref = Image.open("style_reference.png")
subject_ref = Image.open("subject_reference.png")
composition_ref = Image.open("composition_reference.png")

response = model.generate_content(
    [
        "Generate a new image that combines the color palette and mood "
        "from image 1, the subject appearance from image 2, and the "
        "compositional framing from image 3. The scene should be an "
        "outdoor cafe in autumn.",
        style_ref,
        subject_ref,
        composition_ref,
    ],
    generation_config=genai.GenerationConfig(
        response_mime_type="image/png",
    ),
)

image_data = response.candidates[0].content.parts[0].inline_data.data
result = Image.open(io.BytesIO(image_data))
result.save("multi_ref_output.png")

Multi-Person Generation

import google.generativeai as genai
from PIL import Image
import io
import os

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

model = genai.GenerativeModel("gemini-3-pro")

# Load individual character references
person_a = Image.open("character_a.png")
person_b = Image.open("character_b.png")

response = model.generate_content(
    [
        "Generate an image of these two people sitting across from "
        "each other at a restaurant table. Person from image 1 is on "
        "the left, person from image 2 is on the right. Candlelight "
        "dinner setting, shallow depth of field, shot on Leica M11. "
        "Maintain the exact facial features of both people.",
        person_a,
        person_b,
    ],
    generation_config=genai.GenerationConfig(
        response_mime_type="image/png",
    ),
)

image_data = response.candidates[0].content.parts[0].inline_data.data
result = Image.open(io.BytesIO(image_data))
result.save("multi_person.png")

Resolution and Pricing

Gemini Flash

Output Resolution Aspect Ratio Price per Image
1024 x 1024 1:1 ~$0.002
1536 x 1024 3:2 ~$0.003
1024 x 1536 2:3 ~$0.003
1920 x 1080 16:9 ~$0.003

Gemini Pro

Output Resolution Aspect Ratio Price per Image
1024 x 1024 1:1 ~$0.02
1536 x 1024 3:2 ~$0.03
1024 x 1536 2:3 ~$0.03
1920 x 1080 16:9 ~$0.03

Note: Prices are approximate and may vary. Check Google's current pricing page for the latest rates. Input images (references) incur additional token costs.


SynthID Watermarking

All images generated by Gemini include SynthID, Google's imperceptible digital watermark. Key points:

  • Invisible to the human eye — does not affect image quality.
  • Machine-detectable — tools can verify an image was AI-generated.
  • Survives basic edits — resizing, cropping, and light filtering don't remove the watermark.
  • Industry standard — part of Google's commitment to responsible AI and C2PA content provenance.
  • No opt-out — SynthID is always applied. Plan your commercial workflows accordingly.

Exercise

API Workflow Challenge

  1. Set up a Gemini API project with proper environment variable configuration.
  2. Write a script that generates a base product photo.
  3. Use multi-turn editing to make 3 sequential refinements (lighting, color grade, background).
  4. Load 3 reference images and generate a new image that combines elements from all three.
  5. Compare the cost and quality of Flash vs. Pro for the same prompt.
Copied to clipboard