Gemini 3: The Developer's Guide to the New AI Era
A week with Gemini 3 Pro: Deep Think on a real migration, a bug diagnosed from a 2-minute video, and the agentic stuff that still needs a babysitter. →
Óscar Gallego
Web Developer
On this page
I’ve been testing Gemini 3 Pro for the past week, and honestly? I’m annoyed it’s this good.
Not because it’s bad. The opposite. I’d finally gotten comfortable with my Claude/GPT-4 workflow, and now Google drops this on November 18th and throws everything into chaos again. Classic.
But here’s the thing: after forcing myself to rebuild a feature I’d already finished (yeah, I rewrote working code just to test this), I get the hype. This isn’t your typical “10% better at benchmarks” release. Deep Think and actual multimodality (not the fake kind) change how you can work.
Let me show you what I mean.
What’s actually different this time
1. Deep Think, or why it actually pauses now
You know how previous models would just… start vomiting tokens? Mid-thought, zero planning, just word prediction on steroids?
Gemini 3 actually stops. It thinks. You’ll see “[Thinking…]” in the API response, and at first I thought my connection dropped. Nope. It’s genuinely spending compute time planning its answer before committing.
Where I’ve actually used this:
Last Thursday, I asked it to plan our migration from a monolithic Angular app to micro-frontends. Instead of immediately dumping code, it:
- Asked about our deployment pipeline (I didn’t even prompt for this)
- Identified 3 legacy dependencies that would break
- Suggested a phased rollout plan that actually made sense
Could Claude do this? Maybe. But Gemini 3 did it without me babysitting the prompts.
The catch: It’s slower. If you’re used to instant responses, the 3-5 second thinking pause feels eternal. But for architecture decisions? I’ll take accuracy over speed.
2. Multimodality without the duct tape
Every model claims “multimodality” now. Usually, it means they duct-taped three different models together and hoped for the best.
Gemini 3 is different: one model handles everything. Text, code, video, audio, images. Same weights, same architecture.
Real test I did:
I recorded a 2-minute Loom video of a UI bug (button wouldn’t disable after clicking). No transcript, no code snippets. Just me clicking around and complaining in Spanish.
Uploaded it. Asked: “What’s broken?”
It responded:
- Identified the event handler wasn’t preventing double-clicks
- Pointed to the React component by name (HOW?!)
- Suggested adding a
isLoadingstate withdisabled={isLoading}
I checked. It was right. The component name was correct. From a VIDEO.
The weird part: It occasionally hallucinates component names if your video is too long (over 5 minutes). But for quick bug hunts, nothing else I’ve used comes close.
3. Agentic Capabilities (With a Giant Asterisk)
Google’s numbers look great on paper: SWE-bench Verified jumped from 59.6% to 76.2% (Gemini 2.5 Pro to Gemini 3 Pro). In practice? It’s… complicated.
What works:
- Running terminal commands (I let it debug a Docker networking issue, and it actually used
docker inspectcorrectly) - Editing files (made a 12-file refactor across my codebase without breaking tests)
- Running tests and interpreting failures
What doesn’t:
- It still sometimes tries to run commands that don’t exist (
git commit -fixis not a thing, Gemini) - Gets confused if your project structure is unconventional
- Will confidently suggest deleting files that are actually critical (caught this twice; always review)
Trust, but verify. Always.
Where it slots into a real workflow
1. Stop Writing Boilerplate (Seriously, Just Stop)
Forget “vibe coding”. Let’s talk about the stuff you actually hate doing.
My new workflow:
Instead of scaffolding another CRUD API by hand, I now do this:
“I need a FastAPI endpoint for user authentication. JWT tokens, refresh logic, PostgreSQL with SQLAlchemy. Follow our existing pattern from the
/productsendpoints.”
Then I upload our products module as context. Gemini 3 generates 90% correct code. The other 10%? Usually just import paths or environment variables.
Time saved: What used to take 2 hours now takes 20 minutes.
The catch: You need a consistent codebase pattern. If your project is chaos, Gemini 3 will reflect that chaos.
2. Code Reviews (But Not How You Think)
I don’t trust AI for full code reviews yet. But I do use Gemini 3 as a “first pass filter.”
My setup:
- Developer opens PR
- GitHub Action sends diff to Gemini 3
- It flags obvious stuff: hardcoded credentials, missing error handling, SQL injection risks
- Posts these as automated comments
- I (human) do the real review for logic and architecture
This catches the boring stuff so I can focus on whether the code actually solves the problem.
Warning: It will sometimes flag false positives. Last week it complained about a .env file that was actually a .env.example. Don’t blindly merge based on AI feedback.
3. Legacy Code (Where It Actually Shines)
Remember that 6-year-old jQuery spaghetti nobody wants to touch? Gemini 3’s 1M token context is perfect for this.
What I did last month:
Uploaded our entire legacy admin panel (22 files of vanilla JS horror). Asked it to:
- Identify which files are actually used
- Create a dependency graph
- Rewrite the user management module in React
It worked. Not perfectly (I had to fix event handling bugs), but it did 70% of the grunt work.
Pro tip: Do this incrementally. Don’t ask it to rewrite your entire app. Start with one isolated module, verify it works, then move to the next.
So… Should You Care?
Look, I’m not switching from Claude entirely. For brainstorming and writing, Claude still feels more “human.” But for code? Gemini 3 is winning me over.
The Deep Think feature alone saves me from second-guessing architecture decisions. The multimodality is genuinely useful (not gimmicky). And the context window means I can actually load real projects, not toy examples.
My advice:
- Try it if you’re doing refactoring or working with legacy code
- Stick with Claude/GPT-4 if you need creative writing or nuanced explanations
- Don’t trust it blindly: I’ve caught it suggesting bad practices when it’s outside its knowledge cutoff
The future of development is probably agentic. But we’re not there yet. Gemini 3 is a big step, though.
Now excuse me while I go rewrite my CI/CD pipeline for the third time this year.
Related reading: more on letting AI take the first review pass: I tried using AI for code review on my side projects.
P.S. If you’ve tried Gemini 3, I’d love to hear what you think. Did it hallucinate component names for you too, or is that just me? Hit me up on Twitter/X.


