AI in 2025: The terminal becomes cool again

Published: January 20, 2026

A look back at some of the big advancements in 2025 along with a few of our predictions for 2026.

AI Post

2025 was quite a year. Looking back at where we started in January and where we find ourselves now, it's almost hard to believe it’s only been twelve months. From the way development teams ship code to how enterprises consume AI, 2025 was a year of tremendous pace (which was sometimes been hard to keep up with!). Here's our take on the trends, tools and transformations that defined the year and what they mean as we head into 2026.

Enter Claude Code

Perhaps no shift has been more profound for software teams than the emergence of agentic coding tools that live in the terminal. Claude Code led this charge, becoming generally available in May and quickly reaching $1 billion revenue in just six months. The tool doesn't try to replace your IDE – it enhances your existing workflow by leveraging existing CLI tools, deployment scripts, MCP servers (more on that later!), and skills.

What makes these tools genuinely transformative isn't just code generation. It's the ability to hand off entire workflows - triaging issues, writing code, running tests and submitting pull requests all from natural language commands. As one developer from the Claude Code team put it in The Pragmatic Engineer, the philosophy behind Claude Code is to let people "feel the model as raw as possible" rather than cluttering the experience with unnecessary scaffolding.

The competition hasn't sat idle. OpenAI followed with Codex CLI, Google launched Gemini CLI, AWS has Kiro CLI, Cursor has their own CLI….and so on.

Here's what we've observed at Instil, teams with strong engineering practices benefit most from these tools. Test-driven development has become more important than ever – writing tests first then having the agent implement a first pass produces remarkably reliable results. The teams struggling are those trying to use agents as a shortcut around good practices rather than an accelerant for them.

Anthropic's own research backs this up. Surveying their engineers, they found that 27% of Claude-assisted work consists of tasks that simply wouldn't have been done otherwise – “papercut fixes”, exploratory work and nice-to-have tooling that previously fell off the priority list. Engineers reported using Claude in 60% of their work with a 50% productivity boost but critically most said they can only fully delegate 0–20% of their work. The rest requires active collaboration, verification and most importantly human review.

A Glimpse of multi-agent orchestration

At GitHub Universe 2025 in October, GitHub announced Agent HQ and we're genuinely excited about where this is heading. The concept is simple, rather than forcing developers to choose between competing AI agents GitHub is positioning itself as the orchestration layer that unites them all.

Agent HQ introduces Mission Control, a unified command centre accessible across GitHub, VS Code, mobile and the CLI. From here, you can assign work to multiple agents simultaneously and track progress across tasks. GitHub have stated that agents from Anthropic, OpenAI, Google and xAI will be integrated into the platform and made available as part of existing Copilot subscriptions.

The enterprise governance features are particularly interesting. Custom agents defined via prompt files at the organisation or repository level allow teams to encode their standards and create a new generation of tools capable of reviewing code for security issues, analysing test quality or even suggesting missing tests.

We see Agent HQ as the beginning of something bigger. Today, it's about managing coding agents. Tomorrow, it could be the standard interface for orchestrating any kind of AI-powered work across software engineering teams. The infrastructure being built now will matter enormously.

From vibe coding to AI-assisted engineering

Remember when Andrej Karpathy coined "vibe coding" back in February 2025? The term took off so quickly it became Collins Dictionary's Word of the Year. The original concept was playful - fully give in to the vibes, embrace experimentation and forget that the code even exists.

But as the year progressed, the industry quickly learned some hard lessons. An August survey of 18 CTOs found 16 had experienced production disasters directly caused by AI-generated code. Whilst vibe coding an idea through to production may be great for demos and prototypes, AI assisted engineering treats coding agents more as a pairing partner enhancing and augmenting an engineer’s skills rather than replacing them.

Small models, big impact

If there's one technical trend that excited us most last year, it's the rise of small language models (SLMs) for agentic applications. A paper from NVIDIA researchers made the compelling case that most agentic tasks are "repetitive, scoped and non-conversational" – perfectly suited for specialised, efficient models rather than massive general-purpose ones. Cognizant shortly after published a paper outlining how they solved a million step task using LLMs with zero errors and it stands to reason that this approach could also be applied to SLMs eventually too.

For edge AI and conversational agents, this has the potential to be massive. Google's Gemma 3n arrived as one of the first multimodal on-device small language model supporting text, image, video and audio inputs. AI21's Jamba Reasoning 3B handles 250k token context windows on consumer devices.

This is going to be a particularly interesting area to monitor this year as the potential is huge, particularly for agentic workflows in secure environments, providing access to developing economies and the possibility of creating another wave of innovation.

The protocol that won

When Anthropic introduced the Model Context Protocol in November 2024, it was solving a genuine problem, how do AI models connect to external tools and data without building bespoke integrations for every system? A year later, MCP has become the de facto standard.

The adoption story is remarkable. OpenAI officially adopted MCP in March, integrating it across the ChatGPT desktop app, Agents SDK and Responses API. Google confirmed MCP support for Gemini in April. Microsoft released Playwright-MCP for browser automation and MCP connectors for the M365 Copilot ecosystem. The MCP Registry now lists nearly 2000+ servers.

What makes MCP powerful is its simplicity. Built on JSON-RPC 2.0, it provides a universal interface for managing tools, accessing knowledge and sharing prompts. Atlassian, Stripe, GitHub, Notion, Hugging Face and Postman all have official MCP servers. If you're building an AI application that needs to talk to external systems, MCP is a good option. OpenAI have even announced that their app ecosystem is built on top of MCP.

It’s worth nothing that Google also released their Agent-to-Agent (A2A) protocol for cross agent communication and both protocols may evolve to complement each other. But for grounding models in private data and enterprise tools, MCP won 2025.

Computer use agents

One of 2025's most significant developments was AI systems that can operate computers similar to how humans do – clicking buttons, navigating websites, filling forms and manipulating applications through visual interfaces rather than APIs. This "computer use" capability represents a fundamental shift from AI that generates content to AI that takes action.

Anthropic's Computer Use, available through their API, gives Claude the ability to control computers by "looking" at screenshots and performing actions through virtual keyboard and mouse inputs. Unlike traditional automation that requires custom API integrations for each system, Computer Use works with any visual interface – legacy applications, web apps, desktop software. Claude for Chrome launched in August as a pilot browser extension allowing users to instruct Claude to take actions on their behalf within Chrome.

OpenAI's Operator, powered by their Computer-Using Agent (CUA) model initially took a different approach by running in a cloud-hosted virtual browser. It handles tasks like booking restaurants, ordering groceries and filling forms without requiring constant supervision. However, OpenAI have since released their Atlas Browser which moves more in the direction of Anthropic by enabling agent mode which can interact with the user’s local browser.

Current benchmarks tell an honest story - computer use models significantly lag human performance on the OSWorld benchmark with initial scores being around 20% with leading models now hitting around 60% success rates. These remain early days with reliability issues and execution speeds often slower than human operators but the trajectory is clear. Tasks that previously required API integrations, custom scripts or human intervention can increasingly be delegated to AI systems that interact with software the same way we do.

Computer use agents and agentic browsers also bring security concerns with prompt injection posing a real security risk if agents are allowed to run unchecked but Anthropic has reported reducing prompt injection attack success rates to 11.2% through improved safety mitigations. This will be an area of particular interest for cyber security teams in the next year.

The curious case of Nano Banana

We can't write a 2025 review without mentioning Nano Banana – Google's gloriously named image editing model that launched in August and had people genuinely going bananas. Built on Gemini 2.5 Flash, it became the top-rated image editing model in the world enabling transformations that maintain character consistency across edits.

In November, Google followed up with Nano Banana Pro, built on Gemini 3 Pro, adding studio-quality precision, multilingual text rendering and the ability to work with up to 14 different images while maintaining character consistency. Turn yourself into a figurine, blend photos seamlessly, try on different hairstyles – the viral potential was off the charts.

Why mention it in a serious industry recap? Because Nano Banana drove meaningful adoption growth for Gemini. Some reports show that Gemini's monthly active users jumped 30% from August to November with Nano Banana specifically cited as a driver. Nano Banana has also made its way into the likes of NotebookLM which now capable of producing detailed visuals in slide decks covering complex topics. Fun features matter.

When AI improves AI

DeepMind's AlphaEvolve, unveiled in May, represents something genuinely new - an AI system that can discover and optimise algorithms including the algorithms used to train itself.

The practical impacts at Google are already substantial. AlphaEvolve discovered scheduling heuristics that continuously recover 0.7% of Google's worldwide compute resources. It proposed circuit modifications for an upcoming TPU that passed robust verification. Most remarkably, it found optimisations to matrix multiplication that accelerated Gemini's training by 1%.

This feels like a preview of where AI development is heading, systems that improve their own foundations and whilst researching this article Google also released a preview of AlphaEvolve as a managed service available via GCP.

Looking forward

Predictions can bit hit and miss but here's what we're watching in 2026:

Agentic AI goes mainstream. The infrastructure is being built now – Agent HQ, MCP, skills, CLI agents – for AI systems that can take sustained action over hours or days. Expect to see production deployments where agents handle significant workflows with human oversight rather than human execution.

The "full-stack AI engineer" emerges. The skills gap is shifting. Engineers who can effectively collaborate with AI tools, knowing when to delegate and when to intervene will be increasingly valuable. The best performers won't be those who resist AI or those who blindly accept its output but those who've developed intuition for productive collaboration.

Edge AI becomes practical. Small language models running on-device with all the privacy and latency benefits that implies will move from demos to production. We expect to see voice assistants, industrial applications and consumer products that work without cloud connectivity.

Consolidation and standards. The current fragmentation (multiple protocols, competing agent platforms, incompatible tools) will likely consolidate. Anthropic’s decision to create open standards for MCP and skills has paid off massively. GitHub's Agent HQ approach suggests the orchestration layer is stabilising. The newly announced Agentic AI Foundation will hopefully also bring further standardisation across the big players.

Workforce questions intensify. McKinsey's recent survey found 32% of respondents expect workforce decreases of 3% or more in the coming year due to AI, while 13% expect increases. The reality is likely more nuanced with roles changing rather than disappearing, new skills becoming essential and old ones becoming automated. Companies that invest in reskilling and thoughtful transition will navigate this better than those that don't.

Final thoughts

2025 was the year AI went from impressive to essential. Not in a hype-cycle way, but in the practical sense of becoming embedded in daily workflows across industries. The tools work. The infrastructure exists.

At Instil, we've spent the year helping customers navigate this transition – building AI-powered applications, integrating agentic capabilities, making sense of rapidly evolving options. What we've learned is that the organisations succeeding aren't the ones chasing every new model release. They're the ones thinking carefully about which problems AI can actually solve, building robust practices around AI collaboration and investing in their people alongside their technology.

The technology will keep advancing. The real work is figuring out how to use it well, sustainably.

Chris van Es

Head of AI