- Anybody Can AI
- Posts
- Google Launches "Gemini 3"
Google Launches "Gemini 3"
PLUS: OpenAI upgrades GPT-5, claiming it to be more conversational
A New Era Of Intelligence With Gemini 3
Gemini 3, the latest model from Google and Google DeepMind, is now live - and it’s a major leap in how you can learn, build and plan with AI.

Key Points:
Next-gen reasoning + multimodal brains - Gemini 3, Google’s most intelligent model capable of sophisticated reasoning, better understanding of nuance and context, and support for text + image (and more) inputs.
Three main use-cases: Learn, Build, Plan - The launch emphasises you can use Gemini 3 to learn anything (e.g., from videos, handwritten notes, across languages), build anything (developers can tap richer, agent-style coding workflows) and plan anything (multi-step workflows, longer horizon, tools + agents).
Responsible & scaled release - Google points out Gemini 3 is subject to “the most comprehensive set of safety evaluations” to date. It’s already rolling out across products (Gemini app, Search-AI Mode, AI Studio, Vertex AI) while a special “Deep Think” mode will follow after extra safety review.
What’s in it for you ?
It moves beyond just chatbot-style text responses into much deeper reasoning and multimodal support. That means you’ll soon get tools that can handle more complex ideas, integrate images/videos/code, and act across multiple steps.
For creators, that means richer possibilities: interactive visuals, code-powered workflows, turning “idea” into “prototype” faster. For learners, it means more powerful study-and-creation combos (handwritten notes → interactive visualization, foreign lecture → interactive guide).
For developers, the move toward “agentic” workflows (bots that plan and act using tools) opens up new design patterns: not just “ask the model”, but “have the model take a sequence of actions under your control”.
On the flip side: pay attention to how you use it. With more power comes more responsibility - the safety emphasis signals that you’ll want to keep checking model outputs, guard sensitive applications, and think critically about freedom vs control.
GPT 5.1 - A Smarter, More Conversational ChatGPT
OpenAI has just announced GPT-5.1, a major update to their flagship GPT-5 series. According to the announcement, this upgrade is all about being smarter and more enjoyable to use.

Key Points:
Two modes: Instant & Thinking – GPT-5.1 introduces Instant (fast, everyday responses) and Thinking (for deeper problems). The model adapts thinking time based on complexity.
Better speed, better interactions – For simpler tasks the model spends fewer tokens, which means faster replies and lower cost. It also features improved instruction-following and a warmer, more conversational tone.
Coding & tools improved – For developers, GPT-5.1 brings new tools like
apply_patchandshellto support code edits and system commands, and improved performance on coding benchmarks.
My Thoughts
For you - whether you’re a learner, creator or developer - GPT-5.1 signals a shift: we’re moving from simply more capable AI models to AI that fits better into your workflow and style.
If you’re learning, the “Thinking” model means you can trust the AI a bit more when working through tough concepts.
If you’re creating (writing, designing, building), the “Instant” mode means less waiting and more momentum. The improved tone and personality mean your tool “feels nicer” to work with.
If you’re a developer, the new tools and efficiency improvements mean you can build with AI more seamlessly - edits, code-flows, tasks that previously felt clunky may now become smoother.
Action-wise Experiment
Try switching between “Instant” and “Thinking” for different tasks. Ask yourself: what parts of my workflow could benefit from faster responses? When do I need deeper thinking? Use the release as a chance to refine how you engage with AI, not just which model you pick.
Meta launches SAM 3 and SAM 3D
Meta released SAM 3 and SAM 3D, its newest models in the “Segment Anything” family.

Key Points:
What’s new with SAM 3: This model enables open-vocabulary segmentation - you can give it text prompts (e.g., “yellow school bus”, “person wearing a red hat”) and it will segment objects in images and video. It supports detection, tracking and segmentation of multiple instances and works with visual/text prompts.
What’s new with SAM 3D: This companion model enables 3D reconstruction from a single image (or minimal input) - including objects and human bodies. It outputs textured meshes, 3D bodies/poses, and handles clutter/occlusion.
Access & ecosystem: Meta is making both models available open-source (model weights, code) and has launched a “Segment Anything Playground” so creators & developers can experiment. Also, real-world feature integrations (e.g., on the Marketplace “View in Room” AR scenario) are already underway.
My Thoughts
The bar for visual-AI capabilities is rising: text-prompted segmentation + 3D from 2D image are now real. That means your creative or development workflows can start assuming richer vision capabilities.
If you’re a creator: imagine editing a video by simply telling the tool “highlight all people wearing red hats” or “make the lamp float in 3D space” — that becomes more realistic with these models.
If you’re a developer or builder: the open-source nature means you can experiment, fine-tune or integrate these models into apps, AR/VR, robotics or imaging workflows.
Actionable step: Pick one workflow you do (e.g., “segmenting objects in client images”, “creating 3D assets from product photos”, “editing video by object”) and try it with SAM 3 or SAM 3D. See how it changes the time, the precision, or the creative freedom.
Google Launches Nano Banana Pro With 4K Resolution And Web Search
Google has just launched Nano Banana Pro, the next-gen image-generation & editing model built on Gemini 3 Pro and layered with advanced features (think 4K output, real-time web-search integration, multilingual text rendering).

Key Points:
High-fidelity visuals + studio controls - Nano Banana Pro supports up to 4K resolution (and 2K as well) for generated images, a significant step up from typical casual-AI outputs. You also get finer controls: camera angle, lighting, depth of field, color grading - all built into prompt-driven workflows.
Web-grounded reasoning + real-world knowledge - The model is linked to Google Search’s knowledge base, so you can prompt it to generate infographics, diagrams or visuals that use real-time or real-world data/context.
Improved text rendering + multilingual support - One of the standout improvements: you can now generate images with legible text (short or full paragraphs), in multiple languages, with proper fonts/typography. Excellent for posters, mock-ups, localized content.
Actionable step: Pick a small project - say, generating a localized poster or infographic for something you care about. Use Nano Banana Pro (or its trial) and test: how many prompts/refinements did you need? How legible is the text? Does the visual align with real‐world data? Then reflect: What could you do next week to lean into image + knowledge + generation?
Thankyou for reading.