# Gemini 3: The Developer's Guide to the New AI Era

> A week with Gemini 3 Pro: Deep Think on a real migration, a bug diagnosed from a 2-minute video, and the agentic stuff that still needs a babysitter. →

I've been testing Gemini 3 Pro for the past week, and honestly? I'm annoyed it's this good.

Not because it's bad. The opposite. I'd finally gotten comfortable with my Claude/GPT-4 workflow, and now Google drops this on November 18th and throws everything into chaos again. Classic.

But here's the thing: after forcing myself to rebuild a feature I'd already finished (yeah, I rewrote working code just to test this), I get the hype. This isn't your typical "10% better at benchmarks" release. Deep Think and actual multimodality (not the fake kind) change how you can work.

Let me show you what I mean.

## What's actually different this time

### 1. Deep Think, or why it actually pauses now

You know how previous models would just... start vomiting tokens? Mid-thought, zero planning, just word prediction on steroids?

Gemini 3 actually stops. It thinks. You'll see "[Thinking...]" in the API response, and at first I thought my connection dropped. Nope. It's genuinely spending compute time planning its answer before committing.

**Where I've actually used this:**

Last Thursday, I asked it to plan our migration from a monolithic Angular app to micro-frontends. Instead of immediately dumping code, it:

- Asked about our deployment pipeline (I didn't even prompt for this)
- Identified 3 legacy dependencies that would break
- Suggested a phased rollout plan that actually made sense

Could Claude do this? Maybe. But Gemini 3 did it without me babysitting the prompts.

**The catch:** It's slower. If you're used to instant responses, the 3-5 second thinking pause feels eternal. But for architecture decisions? I'll take accuracy over speed.

### 2. Multimodality without the duct tape

Every model claims "multimodality" now. Usually, it means they duct-taped three different models together and hoped for the best.

Gemini 3 is different: one model handles everything. Text, code, video, audio, images. Same weights, same architecture.

**Real test I did:**

I recorded a 2-minute Loom video of a UI bug (button wouldn't disable after clicking). No transcript, no code snippets. Just me clicking around and complaining in Spanish.

Uploaded it. Asked: "What's broken?"

It responded:

- Identified the event handler wasn't preventing double-clicks
- Pointed to the React component by name (HOW?!)
- Suggested adding a `isLoading` state with `disabled={isLoading}`

I checked. It was right. The component name was correct. From a VIDEO.

**The weird part:** It occasionally hallucinates component names if your video is too long (over 5 minutes). But for quick bug hunts, nothing else I've used comes close.

### 3. Agentic Capabilities (With a Giant Asterisk)

Google's numbers look great on paper: SWE-bench Verified jumped from 59.6% to 76.2% (Gemini 2.5 Pro to Gemini 3 Pro). In practice? It's... complicated.

**What works:**

- Running terminal commands (I let it debug a Docker networking issue, and it actually used `docker inspect` correctly)
- Editing files (made a 12-file refactor across my codebase without breaking tests)
- Running tests and interpreting failures

**What doesn't:**

- It still sometimes tries to run commands that don't exist (`git commit -fix` is not a thing, Gemini)
- Gets confused if your project structure is unconventional
- Will confidently suggest deleting files that are actually critical (caught this twice; always review)

Trust, but verify. Always.

## Where it slots into a real workflow

### 1. Stop Writing Boilerplate (Seriously, Just Stop)

Forget "vibe coding". Let's talk about the stuff you actually hate doing.

**My new workflow:**

Instead of scaffolding another CRUD API by hand, I now do this:

> "I need a FastAPI endpoint for user authentication. JWT tokens, refresh logic, PostgreSQL with SQLAlchemy. Follow our existing pattern from the `/products` endpoints."

Then I upload our products module as context. Gemini 3 generates 90% correct code. The other 10%? Usually just import paths or environment variables.

**Time saved:** What used to take 2 hours now takes 20 minutes.

**The catch:** You need a consistent codebase pattern. If your project is chaos, Gemini 3 will reflect that chaos.

### 2. Code Reviews (But Not How You Think)

I don't trust AI for full code reviews yet. But I do use Gemini 3 as a "first pass filter."

**My setup:**

1. Developer opens PR
2. GitHub Action sends diff to Gemini 3
3. It flags **obvious** stuff: hardcoded credentials, missing error handling, SQL injection risks
4. Posts these as automated comments
5. I (human) do the real review for logic and architecture

This catches the boring stuff so I can focus on whether the code actually solves the problem.

**Warning:** It will sometimes flag false positives. Last week it complained about a `.env` file that was actually a `.env.example`. Don't blindly merge based on AI feedback.

### 3. Legacy Code (Where It Actually Shines)

Remember that 6-year-old jQuery spaghetti nobody wants to touch? Gemini 3's 1M token context is perfect for this.

**What I did last month:**

Uploaded our entire legacy admin panel (22 files of vanilla JS horror). Asked it to:

- Identify which files are actually used
- Create a dependency graph
- Rewrite the user management module in React

It worked. Not perfectly (I had to fix event handling bugs), but it did 70% of the grunt work.

**Pro tip:** Do this incrementally. Don't ask it to rewrite your entire app. Start with one isolated module, verify it works, then move to the next.

## So... Should You Care?

Look, I'm not switching from Claude entirely. For brainstorming and writing, Claude still feels more "human." But for code? Gemini 3 is winning me over.

The Deep Think feature alone saves me from second-guessing architecture decisions. The multimodality is genuinely useful (not gimmicky). And the context window means I can actually load real projects, not toy examples.

**My advice:**

- **Try it** if you're doing refactoring or working with legacy code
- **Stick with Claude/GPT-4** if you need creative writing or nuanced explanations
- **Don't trust it blindly:** I've caught it suggesting bad practices when it's outside its knowledge cutoff

The future of development is probably agentic. But we're not there yet. Gemini 3 is a big step, though.

Now excuse me while I go rewrite my CI/CD pipeline for the third time this year.

**Related reading:** more on letting AI take the first review pass: [I tried using AI for code review on my side projects](/en/blog/gemini-flash-code-review-automation/).

---

**P.S.** If you've tried Gemini 3, I'd love to hear what you think. Did it hallucinate component names for you too, or is that just me? Hit me up on [Twitter/X](https://x.com/garbarok).