Gemini Image Generation Research

Client-Ready Hero Images

Real Jezweb business types. Each image uses search context references (location + industry) and a detailed narrative prompt. Ship-ready hero images.

Electrician — Newcastle

2 refs · temp 0.7

Hi-Vis polo, tool belt, brick house with terracotta roof, white work van, native grevilleas. Sun from the north.

Click to view full resolution

Cafe — Armidale

2 refs · temp 0.7

Heritage building interior, exposed brick, Edison bulbs, La Marzocco, customers in winter scarves. Cold overcast light through window with bare deciduous trees.

Click to view full resolution

AI-generated adventure tourism hero image

Adventure Tourism — Port Stephens

2 refs · temp 0.7

Four young adults on Stockton sand dunes, golden afternoon light, Pacific Ocean stretching to horizon. Wide landscape with text overlay space.

Click to view full resolution

Accountant — Maitland

2 refs · temp 0.7

Heritage building with arched window and wrought iron, exposed brick, modern desk. Heritage shopfronts of High Street visible through window.

Click to view full resolution

Phase 2: Every Business Type

11 business types across 5 categories, all generated with search references + narrative prompts. The pipeline works for virtually any industry.

Restaurant — Newcastle Harbour

2 refs · food & hospitality

Harbour view through floor-to-ceiling windows, plated fish, wine glasses, exposed brick walls. Warm evening dining atmosphere.

Click to view full resolution

Cafe Exterior — Berry

2 refs · food & hospitality

Heritage country-town shopfront, striped awning, outdoor tables, potted herbs, bicycle, patrons enjoying coffee in the sunshine.

Click to view full resolution

Hair Salon — Melbourne

2 refs · health & beauty

Industrial-chic interior with exposed brick, tattooed stylist, round gold mirrors, professional product shelves.

Click to view full resolution

Gym & Fitness Trainer

2 refs · health & beauty

CrossFit-style warehouse gym, PT working with client, kettlebells and equipment racks, exposed brick and steel.

Click to view full resolution

Landscaper — Native Garden

2 refs · outdoor trades

Landscaper in Akubra planting native kangaroo paws, modern Australian home, flagstone path, golden hour light filtering through eucalyptus.

Click to view full resolution

Plumber — Bathroom Reno

2 refs · trades

Professional plumber working under a modern vanity, tool bag on floor, stone-tile bathroom, glass shower screen. Clean, competent image.

Click to view full resolution

Mechanic — Workshop

2 refs · trades

Professional mechanic with tablet, car on hoist, blue tool chest, clean organised workshop. Slightly generic but usable.

Click to view full resolution

Solar Installer — Roof Install

2 refs · tech & trades

Two installers in hi-vis on terracotta tile roof, Australian brick suburb, Hills Hoists in yards, neighbouring solar panels already installed.

Click to view full resolution

Wedding Venue — Hunter Valley

2 refs · events

Outdoor ceremony with Brokenback Range backdrop, vineyard rows, timber arch with roses, white cross-back chairs, scattered petals. Golden hour.

Click to view full resolution

Real Estate Agent

2 refs · events & property

Female agent with portfolio outside a weatherboard cottage, white picket fence, hydrangeas, bullnose verandah. Classic Australian listing photo.

Click to view full resolution

Tech Company — Office

2 refs · technology

Modern open-plan office, standing desks, whiteboard with sticky notes, plants as dividers, team collaborating. Clean and professional.

Click to view full resolution

AI-generated veterinary clinic hero image

Vet Clinic — Consultation

4 refs · health

Warm modern consultation room, teal-green accent wall, timber cabinetry. Vet examining a relaxed golden retriever while the owner watches with a smile. Indoor plants, natural window light.

Click to view full resolution

Panorama Reference Breakthrough

Using the streetview Python library to download actual 360-degree panoramas (16,384 x 8,192 pixels) from Google Maps user photos. Cropped to the relevant viewing direction, these provide the best reference quality we've seen.

Tomaree Head — Panorama Refs

3 pano crops · 360° source

The definitive Tomaree result. Metal summit walkway stairs, Zenith Beach on the left, Shoal Bay with boats on the right, green headland between. Fixed the sand dune problem completely.

Click to view full resolution

Tomaree — AI-Curated + QA Loop

curated refs · 2 QA passes

Gemini curated 8 search candidates down to 3 best refs, generated, then QA vision check flagged issues. V2 after QA feedback is dramatically better than V1.

Click to view full resolution

Armidale Beardy Street from panorama references

Armidale Beardy St — Panorama Refs

3 pano crops · 360° source

Red brick pedestrian mall, octagonal brick fountain, bare winter deciduous trees, overcast sky, people in warm jackets. Heritage lamp posts and timber benches.

Click to view full resolution

Maitland High Street from panorama references

Maitland High St — Panorama Refs

3 pano crops · 360° source

Victorian-era parapets, iron-lace verandahs, curved road, heritage shopfronts, parked cars. The panorama refs captured the exact streetscape character.

Click to view full resolution

Tamworth Peel Street from panorama references

Tamworth Peel St — Panorama Refs

2 pano crops + 2 search refs

Australia's country music capital. Wide main street with heritage shopfronts, golden guitar sculpture, palm trees, clock tower in the distance. Blue sky, dry country light.

Click to view full resolution

Cessnock Vincent Street from panorama references

Cessnock Vincent St — Panorama Refs

2 pano crops + 2 search refs

Heritage main street with art deco facades. The model interpreted the historical references too literally — vintage cars and dusty road instead of modern Cessnock. Instructive failure mode.

Click to view full resolution

Multi-Source Reference Selection

Instead of relying solely on Google Images, we pull references from 4 sources in parallel: Google Images (with improved negative keyword queries), Google Maps Places API (real user photos), Wikimedia Commons, and Streetview panoramas. AI curation then picks the best 3-4 refs based on location accuracy, normal state (no events/decorations), composition value, and technical quality.

Tomaree Head from multi-source references

Tomaree Head — Multi-Source

9 candidates · 4 sources · 4 selected

Best Tomaree result yet. Metal summit stairs, Zenith Beach left, Shoal Bay with boats right, golden hour. Curation correctly rejected streetview forest panoramas and annotated tourist images.

Click to view full resolution

Armidale Beardy Street from multi-source references

Armidale Beardy St — Multi-Source

9 candidates · 4 sources · 3 selected

Heritage red brick mall, fountain, winter trees. Curation correctly rejected Booloominbah mansion (wrong location entirely), night shots, and back-of-building views. Google Maps Places API contributed a key reference.

Click to view full resolution

Maitland High Street from multi-source references

Maitland High St — Multi-Source

4 candidates · 2 sources · 3 selected

Victorian parapets, iron-lace verandahs, golden hour. Google Images returned zero results for Maitland — multi-source approach meant Wikimedia + Streetview still produced a quality hero. Resilience through diversity.

Click to view full resolution

Newcastle Beach from multi-source references

Newcastle Beach — Multi-Source

Partial miss · 8 candidates · 3 selected

References were correct (rectangular Newcastle Ocean Baths), but model generated a Bondi-style curved pool. Sydney beach training bias overrode the actual references. Beach and city are plausible but not specifically Newcastle. Instructive failure mode.

Click to view full resolution

Hunter Valley vineyards from multi-source references

Hunter Valley — Multi-Source

9 candidates · 4 sources · 3 selected

Neat vine rows stretching to the Brokenback Range. Rustic timber cellar door, golden hour light, hot air balloon floating in the distance. Quintessential Hunter Valley wine country.

Click to view full resolution

Port Macquarie lighthouse from multi-source references

Port Macquarie — Multi-Source

8 candidates · 4 sources · 3 selected

Tacking Point Lighthouse on a rocky headland with crashing Pacific waves. Green coastal scrub, dramatic sky catching golden hour light. Coastline stretching into the distance.

Click to view full resolution

Complete Image Library Suites

Full sets of website images generated for a single business. The wedding suite uses scraped client photos as style references (reference-guided). The cafe suite uses only prompts (cold-start). Both produce cohesive, professional results.

Wedding Venue — Ceremony Hero

Reference-guided · 8-image suite

Vineyard ceremony with Brokenback Range, timber arbour, seated guests. Style matched from scraped client photos. Open sky left for text overlay.

Click to view full resolution

Wedding Venue — Reception Table

Reference-guided · Style matched

Burgundy napkins, blush/burgundy florals, brass candle holders, crossback chairs, lavender hedges. Nearly identical to the scraped reference photo below.

Click to view full resolution

Scraped Reference — Real Photo

Source · huntervalleyweddingplanner.com.au

The actual photo scraped from the client website and used as Gemini style reference. Compare with the AI-generated version above.

Click to view full resolution

Wedding Venue — Couple Portrait

Reference-guided

Candid feel: couple walking through vineyard rows. Linen suit, minimalist gown, warm backlight. Style perfectly matches the scraped couple reference.

Click to view full resolution

Scraped Reference — Real Photo

Source · huntervalleyweddingplanner.com.au

Real wedding photography used as Gemini style reference. The AI version above captures the vineyard setting, colour grading, and editorial mood.

Click to view full resolution

Wedding Venue — Food & Wine

Reference-guided

Duck with sweet potato puree, Hunter Valley Shiraz, burgundy napkin. The colour palette (burgundy/blush/sage) threads through all 8 images in the suite.

Click to view full resolution

Bakery/Cafe — Hero (Cold-Start)

No references · 6-image suite

Exposed brick, pastry display, Edison bulbs, polished concrete. Generated with zero reference images. 5-part prompt framework creates cohesion through shared anchors.

Click to view full resolution

Bakery/Cafe — Interior

No references · Cold-start

Pressed tin ceiling, exposed brick, recycled timber, monstera plant, newspaper reader. Unmistakably Australian cafe culture. The standout cold-start image.

Click to view full resolution

Bakery/Cafe — Sourdough

No references

Professional food photography quality. Beautiful ear, flour dusting, wheat stalks as props, warm brick bakery in soft focus. Could pass for a real shoot.

Click to view full resolution

Bakery/Cafe — Team Portrait

No references

Two bakers in dark aprons, genuine smiles, flour on hands, kitchen equipment behind. Hands rendered naturally (often an AI failure point).

Click to view full resolution

Hero Image Variants

Same location, different compositions. The model generates distinct hero styles from the same reference photos. All 16:9 with text overlay zones.

Newcastle Beach — Text-Left Hero

5 refs · 16:9

Pool, beach, and city on the right 60%. Left 40% has clean golden sky for headline text overlay. Production hero layout.

Click to view full resolution

Newcastle Baths — Atmospheric

5 refs · 16:9

Pre-sunrise blue hour. Pool water glowing turquoise against dark rocks. City lights twinkling on the hill behind. Cinematic mood.

Click to view full resolution

Hunter Valley — Cellar Door

4 refs · 16:9

Interior generated from exterior vineyard photos. Timber bar, stone walls, vineyard visible through windows. The model inferred the interior style from the region.

Click to view full resolution

Hunter Valley — Aerial

4 refs · 16:9

Geometric vine rows on rolling green hills, Brokenback Range mountains, morning mist in valleys. Drone perspective at 100m.

Click to view full resolution

Port Stephens — Sand Dunes

5 refs · 16:9

Massive golden Stockton dunes, Pacific Ocean beyond, two figures for scale. Late afternoon light creating long shadows. Text-left hero layout.

Click to view full resolution

Port Stephens — Tomaree Summit

5 refs · 16:9

180-degree view from Tomaree Head. Shoal Bay, Zenith Beach below, harbour entrance, dunes in the distance. Epic golden hour landscape.

Click to view full resolution

Port Stephens — Bay Aerial

4 refs · 16:9

Crystal clear turquoise water, boats at anchor in a sheltered bay, white sand beach, lush green headlands. Golden hour light. Tourism hero material.

Click to view full resolution

Leura — Autumn Village

4 refs · 16:9

Blue Mountains heritage village ablaze with autumn colour. Golden yellows, warm oranges, deep reds. Boutique shopfronts, people in warm scarves. Crisp mountain light.

Click to view full resolution

Accuracy Highlights

The model produces surprisingly accurate Australian scenes, even when given bad references or no references at all.

Newcastle Ocean Baths repeatability test

Repeatability Test — Same Prompt x3

Consistent · temp 0.8

Same references + same prompt, run 3 times. All 3 show the same location elements. Viewpoint varies ~20-30 degrees, giving meaningful variety while maintaining geographic accuracy.

Click to view full resolution

Armidale — Resilience to Bad Refs

Street View refs

Street View coordinates landed on a roundabout with a BP station. The model completely ignored those references and generated accurate heritage mall with brick fountain, bare winter trees, and overcast sky.

Click to view full resolution

Key Findings

What we learned across 18 experiments and 130+ images.

Capability	Status	Best Approach
Stock photo replacement	Ready	Detailed prompt + camera specs + AU context
Reference-inspired generation	Ready	Feed stock photo as reference + detailed prompt
Famous landmarks	Ready	Plain prompt (training data sufficient)
Regional landmarks	Ready	Search-based reference pipeline (3-5 photos)
Client hero images	Ready	Two-query search (location + industry) + narrative
Hero composition (text overlay)	Ready	"Subject on right 60%, left 40% clear sky for text"
16:9 hero images	Ready	imageConfig: { aspectRatio: "16:9" }
Generate-until-happy workflow	Ready	Temperature 0.8, run 3x, pick best (~$0.15)
AI reference curation	Ready	Gemini Flash ranks 8 candidates, picks best 3-4 for generation
QA verification loop	Ready	Vision model catches errors; 2 passes dramatically improves accuracy
Panorama references	Ready	360-degree user panoramas provide best reference quality
Trades & services	Ready	Solar, plumbing, landscaping, hair, fitness all client-ready
Multi-source references	Ready	4 sources + AI curation rejects wrong locations, events, night shots; resilient when one source fails
Full image library suite	Ready	8 cohesive images per business from scraped reference photos; cold-start also viable
Website photo scraping	Ready	WordPress sites richest (50+ images); categorised by hero/work/team/logo automatically
Instagram scraping (Apify)	Ready	1080px business photos at ~$0.003/result; Facebook blocked without auth
Cold-start generation	Ready	5-part framework + shared prompt anchors create cohesion without references
Curation reliability	Fixed	Increase maxOutputTokens to 4096; 7-step cascade parser as safety net; 5/5 success
Street-level accuracy	Limited	Coordinate targeting unreliable; panorama refs or curated search better
Icons / transparent assets	Use GPT	Gemini can't do transparency
Text rendering on images	Use GPT	GPT Image 1.5 better at text

Split strategy: Use Gemini for scenes, stock photo replacement, and business hero images. Use GPT Image 1.5 for icons, transparent assets, and text-on-image. Cost per image: ~$0.05 with search context references, generated in ~25 seconds. Add AI curation + QA for critical images (~$0.08 total).

Reference hierarchy: Client's own photos (scraped from website/Instagram) > SerpAPI Google Images (0.11s/query) > Google Maps Places API > Wikimedia > Cold-start prompts. WordPress sites are richest for scraping (50+ images vs near-zero from simple HTML sites). Reference-guided suites produce better palette consistency than cold-start, but cold-start is surprisingly good with the 5-part framework.

Production Pipeline

The validated workflow for generating client-ready image libraries.

Scrape

Client website + Instagram photos. WordPress sites yield 50+ images. Auto-categorise (hero, team, work, logo).

Search

SerpAPI + Google Maps Places for location context. 6-8 candidates per query. Fall back to cold-start if no scrape data.

Curate

Gemini Flash ranks candidates. 7-step cascade JSON parser (maxOutputTokens 4096). Top 3-4 selected as refs.

Generate

8-image suite: hero, interior, team, product, detail, food, event, exterior. Refs anchor palette + style.

QA Verify

Vision model checks output. If issues found, regenerate with feedback. Full suite ~$0.40, ~3 minutes.

Experiment Log

All 18 experiments in chronological order.

#	Experiment	Images	Key Finding
1	Search Grounding Modes	15	Image grounding helps regional landmarks, not generic scenes
2	Multi-Model Benchmark	12	Gemini best for scenes, GPT for icons/text
3	Reference-Inspired Generation	15	Detailed prompt + reference beats grounding for generic scenes
4	Regional Location Accuracy	16	Grounding only activates for 3/8 locations; training data surprisingly good
5	Multi-Reference Pipeline	9	Multi-ref (3+) is the accuracy winner; we control the references
6	Street View + High-Ref-Count	7	Street View coordinates fragile; model ignores bad refs gracefully
7	Hero Consistency	9	16:9 hero images with text overlay zones work reliably
8	Same-Prompt Repeatability	3	Consistent location accuracy; viewpoint varies ~20-30 degrees
9	Client Scenarios	4	All 4 business types genuinely ship-ready
10	Smart Reference Selection + QA Loop	6	AI curation + QA verification loop dramatically improves accuracy
10b	Streetview Panorama References	3	360-degree panoramas provide best reference quality; solved Tomaree problem
12	Expanded Business Types	11	11/12 business types client-ready; trades images exceptionally strong
13	Multi-Source Reference Selection	3	4 sources (Google Images, Maps Places, Wikimedia, Streetview); AI curation rejects wrong locations, events, night shots
14	DataForSEO vs SerpAPI Comparison	0	SerpAPI 23x faster (0.11s vs 2.48s); identical image quality; both scrape Google Images
15	Website Image Scraping	53	WordPress sites yield 50+ images; simple HTML sites nearly empty; auto-categorisation works
16	Instagram Scraping (Apify)	21	Instagram works at ~$0.003/result; Facebook blocked without auth; CDN URLs temporary
17	Wedding Venue Image Library (8 images)	8	Reference-guided from scraped photos; all 8 share burgundy/blush palette; style-consistent
18	Cafe/Bakery Cold-Start Suite (6 images)	6	No references needed; 5-part framework + shared anchors create cohesion; unmistakably Australian