Token optimization goes Kubernetes-native
MCP Optimizer capabilities are now embedded directly in vMCP, bringing the same token reduction and improved tool selection per request from desktop to Kubernetes-native deployments. Instead of every developer configuring a local optimizer instance, platform teams deploy it once and every connected client benefits automatically.
- On-demand tool discovery means agents no longer receive hundreds of tool descriptions in context. Instead, tools are discovered at request time and only the relevant ones are surfaced (up to 8 by default, configurable) via hybrid (semantic + keyword search), cutting token usage by 60-85% per request while improving tool selection accuracy.
- Team-wide token savings from a single deployment means the 60-85% per-request reduction the Optimizer delivers on desktop now applies to every developer connected to vMCP. Deploy it once, and the savings multiply across the entire team without anyone managing a local optimizer instance.
- No per-developer configuration required. Developers point their MCP client at the vMCP endpoint and get optimized routing automatically, with no local embedding models, no search parameter tuning, and no setup drift across the team.
The setup is GitOps-friendly: EmbeddingServer and VirtualMCPServer CRDs
deploy through your existing CI/CD pipeline. For the full configuration
reference and quickstart examples, check out the
vMCP optimizer guide to get started. For
desktop users, MCP Optimizer remains available through the ToolHive UI and CLI
as before.
Getting started
For detailed release notes, check the project repositories:
- ToolHive Runtimes (CLI and Kubernetes Operator)
- ToolHive Desktop UI
- ToolHive Cloud UI
- ToolHive Registry Server
You can find all ToolHive documentation on the Stacklok documentation site.